Category Archives: Database

Currently Executing Queries in Vertica

As mentioned in my last post – Timestamps in Vertica – we had a query which caused our cluster of servers to run at over 90% CPU utilization, when the normally in 10-20 percent range. This query was slowing down all of the other queries that were running at the time, even to the point where the query SELECT 1 FROM dual was taking 1200+ milliseconds.

In this case, the trick to finding the culprit was to find the queries that had been running for a long period of time.

SELECT
    *,
    (start_timestamp - statement_timestamp()) AS current_duration
FROM
    query_requests
WHERE
    is_executing = 't'
ORDER BY
    start_timestamp;

We look at the query_requests table, and find those queries that are still executing, and sort them by their start time. I also added in a column current_duration so we could easily determine how long the queries have been running.

With that we found a query that was the culprit, killed that job, and the database cluster recovered quickly.

Hope this can help someone else when they have some runaway queries they need to find.

–Proctor

Timestamps in Vertica

I was recently trying to find long running queries in Vertica, as we had something that was killing our performance of our cluster, and I got bit by this little bit of behavior around getting the “current” time in Vertica.

For some context, I had an open tmux session on a development server with one of the windows setup to be an open connection to our Vertica database1.

The part about having that database session open in a tmux session is important, because the database session had been open for at least a week at the point in time I was trying to do some checks to find the queries that had been running the longest.

What bit me was that current_timestamp, and now() functions are tied ‘to the start of the current transaction […] to allow a single transaction to have a consistent notion of the “current” time […]’.

The way this bit me, was I was seeing queries that were running for -7 days, because I was doing math on the timestamp against the current_timestamp.

someuser => select current_timestamp;
           ?column?           
-------------------------------
 2014-07-11 10:45:17.135025-05
(1 row)

someuser=> select now();
              now             
-------------------------------
 2014-07-11 10:45:17.135025-05
(1 row)

What I needed to be using was statement_timestamp(), which ‘returns the start time of the current statement’

someuser=> select statement_timestamp();
              now             
-------------------------------
 2014-07-18 14:23:36.492831-05
(1 row)

And that is how you get a more up to date timestamp in your Vertica queries if you are in a long running database session.

–Proctor


  1. Since I have to help fill the role of a database administrator as part of being on the data team at work, and want quick access to be able to run some queries when things like the above happen, instead of hoping I can establish a connection. 

Granting Users Select Permission in Vertica

At work we use Vertica for one of our databases. As part of the data team, it is our job to help monitor and maintain the data in the database, as well as the health of the database as well. One of the things I have been doing is to create database users for the different services that are using the database.

This helps us when we have long running queries, high load on the database, issues with the number of connections, and it would also allow us to add constraints around all of those in the future on a per service basis based off of the different user accounts, not to mention restrict the permissions to tables that are needed and allow some of the experimental tables to exist without causing confusion by having people see those tables before they have been stabilized.

First we create a user for each service with a username and a password; the quotes around the username signify to Vertica that the username is case sensitive.

CREATE USER "UserName"  IDENTIFIED BY 'password';

We then grant SELECT only permissions to the tables that are specific to that user. We do SELECT only permissions as this is a data warehouse style database, and only a few select services have permissions to update the database, while everyone else should be free to read from the tables they need, or are stable.

grant select on <tablename> to "UserName";

We also have some helper functions that we give all of the users, and as I still usually can’t remember the functions and their argument types, I have to do a lookup of the different functions each time, because when granting permissions on Vertica functions, you need to specify the function name along with the arguments.

This query finds me all of the user defined functions in the database.

select * from user_functions;

Once I have the functions and arguments, I can then grant EXECUTE permissions on the functions, by providing the function name and the argument types.

grant execute on function <function name with argument types> to "UserName";

So to grant execute permissions to the function foo that has a number of overloads, it might look like.

grant execute on function foo(Float, Float) to "UserName";
grant execute on function foo(Float, Integer) to "UserName";
grant execute on function foo(Float, Numeric) to "UserName";
grant execute on function foo(Integer, Integer) to "UserName";
grant execute on function foo(Integer, Numeric) to "UserName";
grant execute on function foo(Numeric, Integer) to "UserName";
grant execute on function foo(Numeric, Numeric) to "UserName";

I hope this helps someone the next time they find themselves having to help play db admin/owner on a Vertica database.

–Proctor