As mentioned in my last post – Timestamps in Vertica – we had a query which caused our cluster of servers to run at over 90% CPU utilization, when the normally in 10-20 percent range. This query was slowing down all of the other queries that were running at the time, even to the point where the query
SELECT 1 FROM dual was taking 1200+ milliseconds.
In this case, the trick to finding the culprit was to find the queries that had been running for a long period of time.
SELECT *, (start_timestamp - statement_timestamp()) AS current_duration FROM query_requests WHERE is_executing = 't' ORDER BY start_timestamp;
We look at the
query_requests table, and find those queries that are still executing, and sort them by their start time. I also added in a column
current_duration so we could easily determine how long the queries have been running.
With that we found a query that was the culprit, killed that job, and the database cluster recovered quickly.
Hope this can help someone else when they have some runaway queries they need to find.