Category Archives: SQL

Currently Executing Queries in Vertica

As mentioned in my last post – Timestamps in Vertica – we had a query which caused our cluster of servers to run at over 90% CPU utilization, when the normally in 10-20 percent range. This query was slowing down all of the other queries that were running at the time, even to the point where the query SELECT 1 FROM dual was taking 1200+ milliseconds.

In this case, the trick to finding the culprit was to find the queries that had been running for a long period of time.

SELECT
    *,
    (start_timestamp - statement_timestamp()) AS current_duration
FROM
    query_requests
WHERE
    is_executing = 't'
ORDER BY
    start_timestamp;

We look at the query_requests table, and find those queries that are still executing, and sort them by their start time. I also added in a column current_duration so we could easily determine how long the queries have been running.

With that we found a query that was the culprit, killed that job, and the database cluster recovered quickly.

Hope this can help someone else when they have some runaway queries they need to find.

–Proctor

Timestamps in Vertica

I was recently trying to find long running queries in Vertica, as we had something that was killing our performance of our cluster, and I got bit by this little bit of behavior around getting the “current” time in Vertica.

For some context, I had an open tmux session on a development server with one of the windows setup to be an open connection to our Vertica database¹.

The part about having that database session open in a tmux session is important, because the database session had been open for at least a week at the point in time I was trying to do some checks to find the queries that had been running the longest.

What bit me was that current_timestamp, and now() functions are tied ‘to the start of the current transaction […] to allow a single transaction to have a consistent notion of the “current” time […]’.

The way this bit me, was I was seeing queries that were running for -7 days, because I was doing math on the timestamp against the current_timestamp.

someuser => select current_timestamp;
           ?column?           
-------------------------------
 2014-07-11 10:45:17.135025-05
(1 row)

someuser=> select now();
              now             
-------------------------------
 2014-07-11 10:45:17.135025-05
(1 row)

What I needed to be using was statement_timestamp(), which ‘returns the start time of the current statement’

someuser=> select statement_timestamp();
              now             
-------------------------------
 2014-07-18 14:23:36.492831-05
(1 row)

And that is how you get a more up to date timestamp in your Vertica queries if you are in a long running database session.

–Proctor

Since I have to help fill the role of a database administrator as part of being on the data team at work, and want quick access to be able to run some queries when things like the above happen, instead of hoping I can establish a connection. ↩

Granting Users Select Permission in Vertica

At work we use Vertica for one of our databases. As part of the data team, it is our job to help monitor and maintain the data in the database, as well as the health of the database as well. One of the things I have been doing is to create database users for the different services that are using the database.

This helps us when we have long running queries, high load on the database, issues with the number of connections, and it would also allow us to add constraints around all of those in the future on a per service basis based off of the different user accounts, not to mention restrict the permissions to tables that are needed and allow some of the experimental tables to exist without causing confusion by having people see those tables before they have been stabilized.

First we create a user for each service with a username and a password; the quotes around the username signify to Vertica that the username is case sensitive.

CREATE USER "UserName"  IDENTIFIED BY 'password';

We then grant SELECT only permissions to the tables that are specific to that user. We do SELECT only permissions as this is a data warehouse style database, and only a few select services have permissions to update the database, while everyone else should be free to read from the tables they need, or are stable.

grant select on <tablename> to "UserName";

We also have some helper functions that we give all of the users, and as I still usually can’t remember the functions and their argument types, I have to do a lookup of the different functions each time, because when granting permissions on Vertica functions, you need to specify the function name along with the arguments.

This query finds me all of the user defined functions in the database.

select * from user_functions;

Once I have the functions and arguments, I can then grant EXECUTE permissions on the functions, by providing the function name and the argument types.

grant execute on function <function name with argument types> to "UserName";

So to grant execute permissions to the function foo that has a number of overloads, it might look like.

grant execute on function foo(Float, Float) to "UserName";
grant execute on function foo(Float, Integer) to "UserName";
grant execute on function foo(Float, Numeric) to "UserName";
grant execute on function foo(Integer, Integer) to "UserName";
grant execute on function foo(Integer, Numeric) to "UserName";
grant execute on function foo(Numeric, Integer) to "UserName";
grant execute on function foo(Numeric, Numeric) to "UserName";

I hope this helps someone the next time they find themselves having to help play db admin/owner on a Vertica database.

–Proctor

Insert Into… Select … Where …

About a month ago one of my teammates asked to confirm how to do a Insert Statement in SQL. He wanted a quick check about using an Insert-Values style statement, but instead of Insert-Values, I recommended doing a Insert-Select-Where. One of my other teammates asked why I would do it that way and I explained my reasoning to him, and decided it would make a good post.

Let us start with a schema for some_table. (NOTE: I will be using PostgreSQL for these examples, but I originally started off doing this with SQLServer, so the concepts translate across databases although syntax might change.)

proctor=# d some_table
               Table "public.some_table"
 Column  |            Type             |   Modifiers   
---------+-----------------------------+---------------
 name    | character varying(40)       | 
 id      | integer                     | 
 size    | integer                     | 
 modtime | timestamp without time zone | default now()

A number of people would be happy doing an Insert-Values statement to get the data in the table as shown below.

proctor=# INSERT INTO some_table(id, name, size) VALUES(1, 'a', 14);
INSERT 0 1
proctor=# select * from some_table;
 name | id | size |          modtime           
------+----+------+----------------------------
 a    |  1 |   14 | 2013-12-10 22:07:58.706064
(1 row)

That works and I have nothing against that, but I have found myself more fond of doing a Insert-Select-Where style statement.

This stems from doing database migrations on SQLServer for .NET development before there were any real tools to do it. We would have to do the migrations ourselves, later with a homegrown tool to help handle it, but even then our goal was to make our scripts “SQL Safe.” SQL Safe was our term for making any SQL that we ran idempotent, so if you ran a script, it would determine if it should run again. Schema modifications checked the schema to see if the needed modification already existed. Insert statements would check to see if the values to be inserted was already represented in the system.

It started with doing an IF check against the Insert-Values.

proctor=# DO
proctor-# $BODY$
proctor-# BEGIN
proctor-#   IF NOT EXISTS (SELECT 1 from some_table where name = 'a' and size = 14) THEN
proctor-#     INSERT INTO some_table(id, name, size) VALUES(1, 'a', 14);
proctor-#   END IF;
proctor-# END;
proctor-# $BODY$;
DO
proctor=# select * from some_table;
 name | id | size |          modtime           
------+----+------+----------------------------
 a    |  1 |   14 | 2013-12-10 22:07:58.706064
(1 row)

Which is not bad, but it is not great either. We moved to the Insert-Select-Where style, which looks like the following:

proctor=# INSERT INTO some_table(id, name, size)
proctor-# SELECT 1, 'a', 14
proctor-# WHERE NOT EXISTS (SELECT 1 from some_table where name = 'a' and size = 14);
INSERT 0 0
proctor=# select * from some_table;
 name | id | size |          modtime           
------+----+------+----------------------------
 a    |  1 |   14 | 2013-12-10 22:07:58.706064
(1 row)

What is nice about this is:

it makes the Insert idempotent;
more likely to be cross platform instead of syntax differences in how IF statements are handled, and
it becomes easy to craft the statement I need to create the values and run a test SELECT by just commenting out the INSERT INTO portion.

proctor=# --INSERT INTO some_table(id, name, size)
proctor-# SELECT 1, 'q', 37
proctor-# WHERE NOT EXISTS (SELECT 1 from some_table where name = 'q' and size = 37);
 ?column? | ?column? | ?column? 
----------+----------+----------
        1 | q        |       37
(1 row)

I would love to hear your feedback about this.

–Proctor

Using Entity Framework to run SQL queries

At work we are using Entity Framework, and we have some reports that we were using that were taking a long time to run. The culprit was the Select N+1 monster, which was rearing its ugly head, really 4 or 5 of them, as we were having the Select N+1 problem that many levels deep, practically loading the whole database.

In working to solve this I created some views in the database, with some SQL that was going to need to be run against those views from the codebase. This is because it may be more like a table function, but I still have to get with our DBA to find the gaps in my solution, but in the mean time I had a SQL statement that I need to run from the application, and did not want to get into pure ADO.NET and having to transform data records into the objects I was wanting to return. I found a solution from Craig Stuntz, and applied that to our code base.

I have my code starting from the DbContext, as opposed to just the ObjectContext, and have not done anything to think about SQL Injection attack in this example.

public class ColumnSummary
{
    public string ColumnName { get; set; }
    public bool IsNullable { get; set; }
    public int? MaxLength { get; set; }
}


public IEnumerable<ColumnSummary> GetColumnsForTable(string tableName)
{
    string sql = @"
select Column_Name as ColumnName, Is_Nullable as IsNullable, Character_Maximum_Length as MaxLength
from INFORMATION_SCHEMA.COLUMNS where TABLE_NAME = @tableName";

    var context = ((IObjectContextAdapter)_dbContext).ObjectContext;
    return context.ExecuteStoreQuery<ColumnSummary>(sql, new SqlParameter("@tableName", tableName));
}

This allows me to run SQL, and not have to worry about handling the mapping myself, but just let the Entity Framework data access layers handle that for me, as long as I give it something that matches the results set.

And if you need to do this in multiple places, I would recommend creating some sort of SQL Query object, which would take in the SQL statement you would like to run and the SQLParameters to pass to ExecuteStoreQuery, and let the SQL Query object hide the details of the how the query gets run, and have those details behind a walled garden.

–Proctor