As mentioned in my last post – Timestamps in Vertica – we had a query which caused our cluster of servers to run at over 90% CPU utilization, when the normally in 10-20 percent range. This query was slowing down all of the other queries that were running at the time, even to the point where the query
SELECT 1 FROM dual was taking 1200+ milliseconds.
In this case, the trick to finding the culprit was to find the queries that had been running for a long period of time.
SELECT *, (start_timestamp - statement_timestamp()) AS current_duration FROM query_requests WHERE is_executing = 't' ORDER BY start_timestamp;
We look at the
query_requests table, and find those queries that are still executing, and sort them by their start time. I also added in a column
current_duration so we could easily determine how long the queries have been running.
With that we found a query that was the culprit, killed that job, and the database cluster recovered quickly.
Hope this can help someone else when they have some runaway queries they need to find.
Pingback: Dew Drop – August 5, 2014 (#1829) | Morning Dew