Currently Executing Queries in Vertica

As mentioned in my last post – Timestamps in Vertica – we had a query which caused our cluster of servers to run at over 90% CPU utilization, when the normally in 10-20 percent range. This query was slowing down all of the other queries that were running at the time, even to the point where the query SELECT 1 FROM dual was taking 1200+ milliseconds.

In this case, the trick to finding the culprit was to find the queries that had been running for a long period of time.

SELECT
    *,
    (start_timestamp - statement_timestamp()) AS current_duration
FROM
    query_requests
WHERE
    is_executing = 't'
ORDER BY
    start_timestamp;

We look at the query_requests table, and find those queries that are still executing, and sort them by their start time. I also added in a column current_duration so we could easily determine how long the queries have been running.

With that we found a query that was the culprit, killed that job, and the database cluster recovered quickly.

Hope this can help someone else when they have some runaway queries they need to find.

–Proctor

Timestamps in Vertica

I was recently trying to find long running queries in Vertica, as we had something that was killing our performance of our cluster, and I got bit by this little bit of behavior around getting the “current” time in Vertica.

For some context, I had an open tmux session on a development server with one of the windows setup to be an open connection to our Vertica database1.

The part about having that database session open in a tmux session is important, because the database session had been open for at least a week at the point in time I was trying to do some checks to find the queries that had been running the longest.

What bit me was that current_timestamp, and now() functions are tied ‘to the start of the current transaction […] to allow a single transaction to have a consistent notion of the “current” time […]’.

The way this bit me, was I was seeing queries that were running for -7 days, because I was doing math on the timestamp against the current_timestamp.

someuser => select current_timestamp;
           ?column?           
-------------------------------
 2014-07-11 10:45:17.135025-05
(1 row)

someuser=> select now();
              now             
-------------------------------
 2014-07-11 10:45:17.135025-05
(1 row)

What I needed to be using was statement_timestamp(), which ‘returns the start time of the current statement’

someuser=> select statement_timestamp();
              now             
-------------------------------
 2014-07-18 14:23:36.492831-05
(1 row)

And that is how you get a more up to date timestamp in your Vertica queries if you are in a long running database session.

–Proctor


  1. Since I have to help fill the role of a database administrator as part of being on the data team at work, and want quick access to be able to run some queries when things like the above happen, instead of hoping I can establish a connection. 

Granting Users Select Permission in Vertica

At work we use Vertica for one of our databases. As part of the data team, it is our job to help monitor and maintain the data in the database, as well as the health of the database as well. One of the things I have been doing is to create database users for the different services that are using the database.

This helps us when we have long running queries, high load on the database, issues with the number of connections, and it would also allow us to add constraints around all of those in the future on a per service basis based off of the different user accounts, not to mention restrict the permissions to tables that are needed and allow some of the experimental tables to exist without causing confusion by having people see those tables before they have been stabilized.

First we create a user for each service with a username and a password; the quotes around the username signify to Vertica that the username is case sensitive.

CREATE USER "UserName"  IDENTIFIED BY 'password';

We then grant SELECT only permissions to the tables that are specific to that user. We do SELECT only permissions as this is a data warehouse style database, and only a few select services have permissions to update the database, while everyone else should be free to read from the tables they need, or are stable.

grant select on <tablename> to "UserName";

We also have some helper functions that we give all of the users, and as I still usually can’t remember the functions and their argument types, I have to do a lookup of the different functions each time, because when granting permissions on Vertica functions, you need to specify the function name along with the arguments.

This query finds me all of the user defined functions in the database.

select * from user_functions;

Once I have the functions and arguments, I can then grant EXECUTE permissions on the functions, by providing the function name and the argument types.

grant execute on function <function name with argument types> to "UserName";

So to grant execute permissions to the function foo that has a number of overloads, it might look like.

grant execute on function foo(Float, Float) to "UserName";
grant execute on function foo(Float, Integer) to "UserName";
grant execute on function foo(Float, Numeric) to "UserName";
grant execute on function foo(Integer, Integer) to "UserName";
grant execute on function foo(Integer, Numeric) to "UserName";
grant execute on function foo(Numeric, Integer) to "UserName";
grant execute on function foo(Numeric, Numeric) to "UserName";

I hope this helps someone the next time they find themselves having to help play db admin/owner on a Vertica database.

–Proctor

Stop Writing Code

Code: n. A system of symbols, letters, or words given certain arbitrary meanings, used for transmitting messages requiring secrecy or brevity.
American Heritage® Dictionary of the English Language

This is not what we should be writing.

I started thinking that we as an industry need a new term. The reason I say this is related to the Sapir-Whorf Hypothesis, which says that the language we use influences the way we experience the world around us. If we keep referring to what we write as code, we are going to keep writing software that ‘requires secrecy or brevity’, instead of writing software intended for other people to read. Ableson and Sussman nailed it back in 1985, in the preface to the first edition of Structure and Interpretation of Computer Programs, and I am sure they weren’t the first either.

First, we want to establish the idea that a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.
Structure and Interpretation of Computer Programs

Let’s take a moment and take a look at another term we use commonly.

program: Late Latin programma, public notice, from Greek programma, programmat-, from prographein, to write publicly : pro-, forth; see PRO-2 + graphein, to write; see gerbh- in Indo-European roots.
American Heritage® Dictionary of the English Language

That is not bad, as it refers to writing publicly, so maybe, if we start thinking in that sense of the word, we would be okay; but odds are, it has deviated too far from that original meaning in common thought to try to rescue it and bring it back towards that original meaning. If we are going to come up with a better term, maybe we need to pick a term that expresses that we are telling the story of the system; XP talks about Metaphor of the system; and Eric Evans in Domain Driven Design talks about Ubiquitous Language; others talk about modeling the system; but the commonality is that we want what we write to expressive and suggestive. We have a word for the type of writing they describe.

poem: n. A verbal composition designed to convey experiences, ideas, or emotions in a vivid and imaginative way, characterized by the use of language chosen for its sound and suggestive power and by the use of literary techniques such as meter, metaphor, and rhyme.
[French poème, from Old French, from Latin poēma, from Greek poiēma, from poiein, to create; see kwei-2 in Indo-European roots.]
American Heritage® Dictionary of the English Language

I am not saying that we should all start writing software poetry, but as an example of a word in our lexicon that may have a better fit. I suggest we think about what words would better describe the kind of software we wish we were reading, and then, see how the words we choose change the way we write software. What if we tried writing software poetry, software novellas, or software as the literature of the system, and see how that affects the way we write our software. Do we gain expressivity? Do we encourage others to read what we have written and get editorial reviews? Do we make sure to go back and read aloud what has been written to see if we stumble upon awkward phrasing? Do we take time to go back and try to rewrite what we have just written to make sure it is written well?

I have recently started reading On Writing Well by William Zinsser, and while he is talking about the word processor, I think it relates to writing software as well, since all too often we just move on and submit our writing as soon as we think it is working.

On one level the new torrent is good news. Any invention that reduces the fear of writing is up there with air-conditioning and the lightbulb. But, as always, there’s a catch. Nobody told all the new computer writers that the essence of writing is rewriting. Just because they’re writing fluently doesn’t mean they’re writing well.
On Writing Well

–Proctor

Migrating to a Git Deployment User

At work we have a couple of different user account that a number of older applications are deployed under. These different users each have full rights to the various repositories that run under that user account. For the sake of example, we will say the user’s account is alice.

As anybody who is familiar with making changes in a work environment, you can’t always just stop the world to make updates, but have to take baby steps, to get to your end goal. Ideally we would like the app to be deployed under dan, a deployment user with “read-only” permissions to the git repository. (I say read-only, even though there is no such thing in git, but the point is we don’t want the dan user to be able to push back to the source-of-truth repository, and shouldn’t really be making changes on the box to begin with.)

There are a couple of different git repositories that run under alice, but for this example we are going to be working to migrate the repository etudes_for_erlang [1] to be fetched as the deploy user dan.

I am assuming you have already done the work of setting up access control list policy, either through server-side hooks, or github permissions, and we will be focusing on changing the way we pull the repositories down under alice, to look like she is dan.

First step will be to create a new ssh key for “dan”, and get the new public ssh key added as an authorized key for dan on the git server. We will refer to the public key as dan_rsa.

As alice, edit her ssh config file found at ~alice/.ssh/config. We will be adding two new entries to that config. The first entry is to allow alice to connect to the remote server, in this case, github.com, as herself.

Host github.com
HostName github.com
IdentityFile ~/.ssh/id_rsa

We use alice’s normal ssh key, id_rsa, and just connect to github.com as normal. We specify that when we refer to github.com by itself, we are going to use alice’s id_rsa key to connect to the actual host github.com. This allows alice to fetch, push, and all the other good stuff for those repositories under her that we are not yet deciding to convert over to the deployment dan account.

We also add another Host entry for working as dan.

Host deploy.github.com
HostName github.com
IdentityFile ~/.ssh/dan_rsa

In this case, we specify that the host is deploy.github.com. What this does is that when we refer to that deploy.github.com we want to connect to github.com but use the ssh key for dan, by specifying the identity file as dan_rsa.

At this point you should be able to ssh into both github.com and deploy.github.com successfully, and github should be identifying you as the correct user:

ssh -T git@github.com
# Attempt to SSH in to github
# Hi alice! You've successfully authenticated, but GitHub does not provide
# shell access.
ssh -T git@deploy.github.com
# Attempt to SSH in to github
# Hi dan! You've successfully authenticated, but GitHub does not provide
# shell access.

We then go into the etudes_for_erlang repository and issue the following commands:

git remote set-url origin git@deploy.github.com:dfw-erlang/etudes_for_erlang.git
git remote set-url --push origin "DO NOT PUSH!!!"

We set the origin url to connect to deploy.github.com instead of github.com, so that we will be connecting as dan.

The second git remote set-url command is a trick to set the push url to something invalid, in this case the string DO NOT PUSH!!!, so that when we try to push we get an error saying we could not connect to the remote repository “DO NOT PUSH!!!”, and that helps to tell us that we should not be pushing back to the source repository.

There you have it, the first steps towards migrating git repositories to be accessed as a deployment user.

If you would like to find out more about the tricks used in the ssh config file, make sure to check out the man page for ssh_config.

Hope you find this useful,
–Proctor


Footnotes:
[1] This was the resulting code of going through the book Études for Erlang as part of the DFW Erlang User Group. back

Ten Episodes of Functional Geekery Live!

I just released the tenth episode of my podcast Functional Geekery.

I had been thinking about doing a podcast for a while, in return of all the information I get out of listening to other podcasts as part of my “Automobile University”, but could never come up with the niche. It occurred to me about 3am in the morning when I was taking a shift to get our little one, Ada (yes, after that Ada) who was about 5 months old at the time, back to sleep; the best ideas to come when you are least prepared to think about them.

I told my wife about my “crazy idea”, and explained to her what my goal was, and got her support for this experiment I was wanting to do, and told her I could do this in a very lean manner. I had a headset with microphone already, and told her it would just be a domain, hosting, and recording setup. My goal was to see if I couldn’t start a podcast for only about $100 investment with all things totaled. I would shoot to see if I could get at least 10 episodes done, to amortize the cost to be about $10 an episode.

I figured if nobody listened, but I could have 10 interesting conversations, the learning and exposure to ideas from those conversations would easily outweigh that initial investment, and the podcast would give me a good way to reach out to people I would love to talk to but probably wouldn’t have had the opportunity to talk to anytime soon.

I want to give a sincere heartfelt Thank You to all my guests so far, and everybody else I have reached out to so far to get initial conversations going about being a guest. Everyone has been much more receptive and open than I could have ever imagined. Everybody has been kind, and the worst I have gotten was some deferrals due to busy schedules, which I can appreciate. This is been even more honoring, as most of the people I have reached out to had likely never heard of me when I sent my emails to them asking if they would do me the honor of being a guest on the podcast I was starting. Thank you all for your support, and kindness, and if you ever have more things you would like to talk about, all of you always have an open invitation back.

I also want to thank everybody who has listened, and shared the podcast with others. I have gotten much better reception and response that I realistically imagined. Thank you for your shares, (re)tweets, comments, and suggestions. If you have anything else you want to share I would love to hear from you. If you need to find the best way to contact me, just head over to Functional Geekery’s About page.

Don’t worry, I am not planning on going anywhere at this point. I have another recording “in the can”, and am working to line up some more great guests. I also have a large list of people I would love to talk to at some point, and would hate to end before I got to use the podcast as a reason to be able to have a interesting conversation with them as well. 😉

As always, a giant Thank You goes out to David Belcher for the logo, who took my rough idea of a logo, and transformed it into something brilliant.

And an even bigger THANK YOU goes out to my wife, who has let me pursue this “crazy idea”.

Your host,
–Proctor

Creating a Ruby Rack Middleware for Application Version Id

If you are developing an application that uses the Ruby rack framework, and most Ruby application frameworks and servers do, it is pretty simple to create your own rack middleware.

In this post I outline how simple it is go create a new middleware for your rack application. In fact, I did something very similar in one of the web services I worked on so that we could get the exact version of the code that our app was running against for troubleshooting purposes. I suggested we add a version identifier to all responses so we could log the requests, the responses and their response codes, and have the version of the running code in every response as well. This would help us troubleshoot failures that are not failing anymore, or failed “intermittently” because they might be hitting one of two load balanced servers in case someone updated one server, but not the other, such as for A-B testing.

This was actually a very easy thing to do by taking advantage of rack’s middleware capability. The code below was all that it took to create a new piece of middleware to give us version identifiers on each response.

module Middleware
  class VersionIdentifier
    def initialize(app, version_id)
      @app = app
      @version_id = version_id
    end

    def call(env)
      @app.call(env).tap do |_status, headers, _body|
        headers["version-id"] = @version_id
      end
    end

    private
      attr_reader :version_id
  end
end

In the above source, we create a initialize method which takes app, which is the rack application that this middleware component is wrapping, and version_id, which is how we will identify the version of the application is running.

The next step to define the standard rack protocol of call(env). In this, since we are only concerned about the response, we call the next rack component by invoking call on @app which we got when the middleware was constructed. This could be another piece of middleware, or it could be the actual application, but we don’t care about specifics, which is part of the beauty of the way middleware for rack works. What we do care about is the return of the call to @app.call(env), since we are interested in modifying the response headers. We do this with using the tap method on all Ruby objects, and adding a new entry to the header hash with the key version-id and the value that we got on creation for @version_id.

That is all there is to creating a rack middleware component to add a version identifier to the headers for every response that is served by the rack web server.

To use this middleware, we first need to determine what the will represent the version of our app. For this example, we will use the git SHA that represents the code we are running against, since we are assuming we are behaving and there is no code running in production that is not checked into source control. We add this to our startup script/configuration file, e.g. config.ru.

git_sha = `git rev-parse HEAD`

Once we have that settled, we just declare we are going to use our middleware, and we pass in the git_sha that we found above.

use Middleware::VersionIdentifier, git_sha

We start up our application and voilà, we have a version identifier getting returned on every response with the header of version-id.

$ curl -i "localhost:9292"
HTTP/1.1 200 OK
version-id: 7df6e26a3dd1d3d05130e054fbcb1b878d965767
Content-Length: 7

Howdy!
$

The full source code can be found at https://github.com/stevenproctor/versionid-middleware-example.

Letter to the FCC

Dear FCC Board (http://www.fcc.gov/leadership),

It is highly disconcerting to me that the FCC would think about reversing its stance on Net Neutrality.

As consumers we already pay outrageous bills for flaky service, in addition to most areas, usually having only one available “service” provider to choose from, creating a local monopoly, but to allow them to add insult to injury and determine which traffic they wish to provide us at whatever speed they wish is outrageous.

I already pay for them to deliver my content at a given bandwidth, as do the companies on the other end of the line, such as my hosting provider which hosts my blog and extra storage, and then allowing them to charge yet again for priority delivery for certain companies, is wrong.

If this weren’t about Cable Companies and ISPs, this would be easily seen as a racket. You are renting an building from me for your business, but by the way, I also own the streets that come in, so you have to pay me to allow the streets to be open so people can come to you business.allow the streets to be open so people can come to you business.

I quote this from the DOJs website (http://www.justice.gov/usao/eousa/foia_reading_room/usam/title9/110mcrm.htm#9-110.010) ‘The purpose of the RICO statute is “the elimination of the infiltration of organized crime and racketeering into legitimate organizations operating in interstate commerce.”‘ If this isn’t an example of racketeering and interstate commerce, I don’t know what is.

While I understand that a number of the board were previously affiliated with the large cable companies and ISP, you are now in public service, and should be watching out for the good of the citizens of this country.

Hoping that you do the right thing,
Steven Proctor

Downloading a File Using cURL

A coworker was trying to user cURL to get a file and was getting tired of having to redirect the output and save to the same name:

curl https://octodex.github.com/images/okal-eltocat.jpg > okal-eltocat.jpg

After searching through the man page for cURL I found that you can add the —remote-name flag, allowing you to have the command be just:

curl --remote-name https://octodex.github.com/images/okal-eltocat.jpg

Or if wanting use the short version, for manually typing it in at the command line, outside of a script:

curl -O https://octodex.github.com/images/okal-eltocat.jpg

I figured I would write this up here for my future reference, and anybody else who might find this useful.

–Proctor