One of the things that is nice about Ruby is the ability to use it for scripting. Ruby makes it nice and easy to shell out to run standard *nix commands.
The problem is, the nice simple way that seems to be the most common, is also the least safe. I’m looking at you backticks.
#!/usr/env ruby result = `gerp -e 'some regex' foo` puts "<time to process result>" puts "All Good"
And when we run the above, we see that our script completed, even though we really bombed out trying to run
gerp instead of
sh: gerp: command not found <time to process result> All Good
Sure we get a warning
gerp: command not found, but the script still proceeds to plow ahead and do any other side effects that it is setup to do, although something went wrong earlier.
This has become one of the biggest thorns in my side at work.
Lucky the technical solution to this is straight forward, so I want to share it with readers of this blog so you can stop making the mistake of using backticks to shell out commands, and use a solution that does not cover up issues in production.
#!/usr/env ruby require 'open3' def execute_syscall(cmd) results, error, status = Open3.capture3(cmd) raise error unless status.success? results end result = execute_syscall "gerp -e 'some regex' foo" puts "<time to process result>" puts "All Good"
First we require
open3. Open3 allows us to be able to capture the results, the error stream, and the status. This way, we can check if the status of the command was anything other than success, and if not, we raise the error we get from STDERR.
Now when we run it, the script stops in its tracks. Not only that, but our script returns a failure error code as well.
> ruby script_demo.rb script_demo.rb:7:in `execute_syscall': sh: gerp: command not found (RuntimeError) from script_demo.rb:12:in `<main>' > echo $? 1
That way we can know that something went wrong in our program, especially when it is setup as a cron job or as some other unsupervised task.
Hope this can save you some headaches and frustration on your end as well.
It is that time of year again. The time when the Global Day of Coderetreat date has been announced.
This year, the Global Day of Coderetreat is November 15th of 2014.
And with that, I am pleased to announce the Dallas/Fort Worth area’s participation in Global Day of Coderetreat.
For those who are unfamiliar with Coderetreat, it is a day of practice where developers of all levels and languages get together to engage in deliberate practice. The day is spent in multiple 45 minute sessions, working on the problem of Conway’s Game Of Life. During the sessions the goal is to pair program and use TDD to drive to a solution, while focusing on The Four Rules of Simple Design. At the end of each session, everybody deletes their work, and then has the opportunity to share an insight that they had during the previous session. As the day goes on, constraints get added to the sessions, with the goal to make you think about solving the problem in a different way.
I encourage anybody in the Dallas/Fort Worth area to sign up while space is available, and if you are located somewhere else, check the events listing on the Coderetreat site, and see if there is an event in your area. If not, I highly encourage you to get one setup for your area.
As mentioned in my last post – Timestamps in Vertica – we had a query which caused our cluster of servers to run at over 90% CPU utilization, when the normally in 10-20 percent range. This query was slowing down all of the other queries that were running at the time, even to the point where the query
SELECT 1 FROM dual was taking 1200+ milliseconds.
In this case, the trick to finding the culprit was to find the queries that had been running for a long period of time.
SELECT *, (start_timestamp - statement_timestamp()) AS current_duration FROM query_requests WHERE is_executing = 't' ORDER BY start_timestamp;
We look at the
query_requests table, and find those queries that are still executing, and sort them by their start time. I also added in a column
current_duration so we could easily determine how long the queries have been running.
With that we found a query that was the culprit, killed that job, and the database cluster recovered quickly.
Hope this can help someone else when they have some runaway queries they need to find.
I was recently trying to find long running queries in Vertica, as we had something that was killing our performance of our cluster, and I got bit by this little bit of behavior around getting the “current” time in Vertica.
The part about having that database session open in a tmux session is important, because the database session had been open for at least a week at the point in time I was trying to do some checks to find the queries that had been running the longest.
What bit me was that current_timestamp, and now() functions are tied ‘to the start of the current transaction [...] to allow a single transaction to have a consistent notion of the “current” time [...]‘.
The way this bit me, was I was seeing queries that were running for -7 days, because I was doing math on the timestamp against the current_timestamp.
someuser => select current_timestamp; ?column? ------------------------------- 2014-07-11 10:45:17.135025-05 (1 row) someuser=> select now(); now ------------------------------- 2014-07-11 10:45:17.135025-05 (1 row)
What I needed to be using was statement_timestamp(), which ‘returns the start time of the current statement’
someuser=> select statement_timestamp(); now ------------------------------- 2014-07-18 14:23:36.492831-05 (1 row)
And that is how you get a more up to date timestamp in your Vertica queries if you are in a long running database session.
- Since I have to help fill the role of a database administrator as part of being on the data team at work, and want quick access to be able to run some queries when things like the above happen, instead of hoping I can establish a connection. ↩
At work we use Vertica for one of our databases. As part of the data team, it is our job to help monitor and maintain the data in the database, as well as the health of the database as well. One of the things I have been doing is to create database users for the different services that are using the database.
This helps us when we have long running queries, high load on the database, issues with the number of connections, and it would also allow us to add constraints around all of those in the future on a per service basis based off of the different user accounts, not to mention restrict the permissions to tables that are needed and allow some of the experimental tables to exist without causing confusion by having people see those tables before they have been stabilized.
First we create a user for each service with a username and a password; the quotes around the username signify to Vertica that the username is case sensitive.
CREATE USER "UserName" IDENTIFIED BY 'password';
We then grant
SELECT only permissions to the tables that are specific to that user. We do
SELECT only permissions as this is a data warehouse style database, and only a few select services have permissions to update the database, while everyone else should be free to read from the tables they need, or are stable.
grant select on <tablename> to "UserName";
We also have some helper functions that we give all of the users, and as I still usually can’t remember the functions and their argument types, I have to do a lookup of the different functions each time, because when granting permissions on Vertica functions, you need to specify the function name along with the arguments.
This query finds me all of the user defined functions in the database.
select * from user_functions;
Once I have the functions and arguments, I can then grant
EXECUTE permissions on the functions, by providing the function name and the argument types.
grant execute on function <function name with argument types> to "UserName";
So to grant execute permissions to the function
foo that has a number of overloads, it might look like.
grant execute on function foo(Float, Float) to "UserName"; grant execute on function foo(Float, Integer) to "UserName"; grant execute on function foo(Float, Numeric) to "UserName"; grant execute on function foo(Integer, Integer) to "UserName"; grant execute on function foo(Integer, Numeric) to "UserName"; grant execute on function foo(Numeric, Integer) to "UserName"; grant execute on function foo(Numeric, Numeric) to "UserName";
I hope this helps someone the next time they find themselves having to help play db admin/owner on a Vertica database.
Code: n. A system of symbols, letters, or words given certain arbitrary meanings, used for transmitting messages requiring secrecy or brevity.
-American Heritage® Dictionary of the English Language
This is not what we should be writing.
I started thinking that we as an industry need a new term. The reason I say this is related to the Sapir-Whorf Hypothesis, which says that the language we use influences the way we experience the world around us. If we keep referring to what we write as code, we are going to keep writing software that ‘requires secrecy or brevity’, instead of writing software intended for other people to read. Ableson and Sussman nailed it back in 1985, in the preface to the first edition of Structure and Interpretation of Computer Programs, and I am sure they weren’t the first either.
First, we want to establish the idea that a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.
-Structure and Interpretation of Computer Programs
Let’s take a moment and take a look at another term we use commonly.
program: Late Latin programma, public notice, from Greek programma, programmat-, from prographein, to write publicly : pro-, forth; see PRO-2 + graphein, to write; see gerbh- in Indo-European roots.
-American Heritage® Dictionary of the English Language
That is not bad, as it refers to writing publicly, so maybe, if we start thinking in that sense of the word, we would be okay; but odds are, it has deviated too far from that original meaning in common thought to try to rescue it and bring it back towards that original meaning. If we are going to come up with a better term, maybe we need to pick a term that expresses that we are telling the story of the system; XP talks about Metaphor of the system; and Eric Evans in Domain Driven Design talks about Ubiquitous Language; others talk about modeling the system; but the commonality is that we want what we write to expressive and suggestive. We have a word for the type of writing they describe.
poem: n. A verbal composition designed to convey experiences, ideas, or emotions in a vivid and imaginative way, characterized by the use of language chosen for its sound and suggestive power and by the use of literary techniques such as meter, metaphor, and rhyme.
[French poème, from Old French, from Latin poēma, from Greek poiēma, from poiein, to create; see kwei-2 in Indo-European roots.]
-American Heritage® Dictionary of the English Language
I am not saying that we should all start writing software poetry, but as an example of a word in our lexicon that may have a better fit. I suggest we think about what words would better describe the kind of software we wish we were reading, and then, see how the words we choose change the way we write software. What if we tried writing software poetry, software novellas, or software as the literature of the system, and see how that affects the way we write our software. Do we gain expressivity? Do we encourage others to read what we have written and get editorial reviews? Do we make sure to go back and read aloud what has been written to see if we stumble upon awkward phrasing? Do we take time to go back and try to rewrite what we have just written to make sure it is written well?
I have recently started reading On Writing Well by William Zinsser, and while he is talking about the word processor, I think it relates to writing software as well, since all too often we just move on and submit our writing as soon as we think it is working.
On one level the new torrent is good news. Any invention that reduces the fear of writing is up there with air-conditioning and the lightbulb. But, as always, there’s a catch. Nobody told all the new computer writers that the essence of writing is rewriting. Just because they’re writing fluently doesn’t mean they’re writing well.
-On Writing Well
- If it doesn’t look like a duck, or doesn’t quack like a duck, doing the equivalent of adding the necessary methods on the top-most Object class so everything will behave like a duck.
At work we have a couple of different user account that a number of older applications are deployed under. These different users each have full rights to the various repositories that run under that user account. For the sake of example, we will say the user’s account is alice.
As anybody who is familiar with making changes in a work environment, you can’t always just stop the world to make updates, but have to take baby steps, to get to your end goal. Ideally we would like the app to be deployed under dan, a deployment user with “read-only” permissions to the git repository. (I say read-only, even though there is no such thing in git, but the point is we don’t want the dan user to be able to push back to the source-of-truth repository, and shouldn’t really be making changes on the box to begin with.)
There are a couple of different git repositories that run under alice, but for this example we are going to be working to migrate the repository etudes_for_erlang  to be fetched as the deploy user dan.
I am assuming you have already done the work of setting up access control list policy, either through server-side hooks, or github permissions, and we will be focusing on changing the way we pull the repositories down under alice, to look like she is dan.
First step will be to create a new ssh key for “dan”, and get the new public ssh key added as an authorized key for dan on the git server. We will refer to the public key as
As alice, edit her ssh config file found at
~alice/.ssh/config. We will be adding two new entries to that config. The first entry is to allow alice to connect to the remote server, in this case, github.com, as herself.
Host github.com HostName github.com IdentityFile ~/.ssh/id_rsa
We use alice’s normal ssh key,
id_rsa, and just connect to github.com as normal. We specify that when we refer to
github.com by itself, we are going to use alice’s id_rsa key to connect to the actual host
github.com. This allows alice to fetch, push, and all the other good stuff for those repositories under her that we are not yet deciding to convert over to the deployment dan account.
We also add another Host entry for working as dan.
Host deploy.github.com HostName github.com IdentityFile ~/.ssh/dan_rsa
In this case, we specify that the host is
deploy.github.com. What this does is that when we refer to that
deploy.github.com we want to connect to
github.com but use the ssh key for dan, by specifying the identity file as
At this point you should be able to ssh into both github.com and deploy.github.com successfully, and github should be identifying you as the correct user:
ssh -T firstname.lastname@example.org # Attempt to SSH in to github # Hi alice! You've successfully authenticated, but GitHub does not provide # shell access. ssh -T email@example.com # Attempt to SSH in to github # Hi dan! You've successfully authenticated, but GitHub does not provide # shell access.
We then go into the etudes_for_erlang repository and issue the following commands:
git remote set-url origin firstname.lastname@example.org:dfw-erlang/etudes_for_erlang.git git remote set-url --push origin "DO NOT PUSH!!!"
We set the origin url to connect to
deploy.github.com instead of
github.com, so that we will be connecting as dan.
git remote set-url command is a trick to set the push url to something invalid, in this case the string
DO NOT PUSH!!!, so that when we try to push we get an error saying we could not connect to the remote repository “DO NOT PUSH!!!”, and that helps to tell us that we should not be pushing back to the source repository.
There you have it, the first steps towards migrating git repositories to be accessed as a deployment user.
If you would like to find out more about the tricks used in the ssh config file, make sure to check out the man page for ssh_config.
Hope you find this useful,
I just released the tenth episode of my podcast Functional Geekery.
I had been thinking about doing a podcast for a while, in return of all the information I get out of listening to other podcasts as part of my “Automobile University”, but could never come up with the niche. It occurred to me about 3am in the morning when I was taking a shift to get our little one, Ada (yes, after that Ada) who was about 5 months old at the time, back to sleep; the best ideas to come when you are least prepared to think about them.
I told my wife about my “crazy idea”, and explained to her what my goal was, and got her support for this experiment I was wanting to do, and told her I could do this in a very lean manner. I had a headset with microphone already, and told her it would just be a domain, hosting, and recording setup. My goal was to see if I couldn’t start a podcast for only about $100 investment with all things totaled. I would shoot to see if I could get at least 10 episodes done, to amortize the cost to be about $10 an episode.
I figured if nobody listened, but I could have 10 interesting conversations, the learning and exposure to ideas from those conversations would easily outweigh that initial investment, and the podcast would give me a good way to reach out to people I would love to talk to but probably wouldn’t have had the opportunity to talk to anytime soon.
I want to give a sincere heartfelt Thank You to all my guests so far, and everybody else I have reached out to so far to get initial conversations going about being a guest. Everyone has been much more receptive and open than I could have ever imagined. Everybody has been kind, and the worst I have gotten was some deferrals due to busy schedules, which I can appreciate. This is been even more honoring, as most of the people I have reached out to had likely never heard of me when I sent my emails to them asking if they would do me the honor of being a guest on the podcast I was starting. Thank you all for your support, and kindness, and if you ever have more things you would like to talk about, all of you always have an open invitation back.
I also want to thank everybody who has listened, and shared the podcast with others. I have gotten much better reception and response that I realistically imagined. Thank you for your shares, (re)tweets, comments, and suggestions. If you have anything else you want to share I would love to hear from you. If you need to find the best way to contact me, just head over to Functional Geekery’s About page.
Don’t worry, I am not planning on going anywhere at this point. I have another recording “in the can”, and am working to line up some more great guests. I also have a large list of people I would love to talk to at some point, and would hate to end before I got to use the podcast as a reason to be able to have a interesting conversation with them as well.
As always, a giant Thank You goes out to David Belcher for the logo, who took my rough idea of a logo, and transformed it into something brilliant.
And an even bigger THANK YOU goes out to my wife, who has let me pursue this “crazy idea”.
If you are developing an application that uses the Ruby rack framework, and most Ruby application frameworks and servers do, it is pretty simple to create your own rack middleware.
In this post I outline how simple it is go create a new middleware for your rack application. In fact, I did something very similar in one of the web services I worked on so that we could get the exact version of the code that our app was running against for troubleshooting purposes. I suggested we add a version identifier to all responses so we could log the requests, the responses and their response codes, and have the version of the running code in every response as well. This would help us troubleshoot failures that are not failing anymore, or failed “intermittently” because they might be hitting one of two load balanced servers in case someone updated one server, but not the other, such as for A-B testing.
This was actually a very easy thing to do by taking advantage of rack’s middleware capability. The code below was all that it took to create a new piece of middleware to give us version identifiers on each response.
module Middleware class VersionIdentifier def initialize(app, version_id) @app = app @version_id = version_id end def call(env) @app.call(env).tap do |_status, headers, _body| headers["version-id"] = @version_id end end private attr_reader :version_id end end
In the above source, we create a
initialize method which takes
app, which is the rack application that this middleware component is wrapping, and
version_id, which is how we will identify the version of the application is running.
The next step to define the standard rack protocol of
call(env). In this, since we are only concerned about the response, we call the next rack component by invoking
@app which we got when the middleware was constructed. This could be another piece of middleware, or it could be the actual application, but we don’t care about specifics, which is part of the beauty of the way middleware for rack works. What we do care about is the return of the call to
@app.call(env), since we are interested in modifying the response headers. We do this with using the tap method on all Ruby objects, and adding a new entry to the header hash with the key
version-id and the value that we got on creation for
That is all there is to creating a rack middleware component to add a version identifier to the headers for every response that is served by the rack web server.
To use this middleware, we first need to determine what the will represent the version of our app. For this example, we will use the git SHA that represents the code we are running against, since we are assuming we are behaving and there is no code running in production that is not checked into source control. We add this to our startup script/configuration file, e.g.
git_sha = `git rev-parse HEAD`
Once we have that settled, we just declare we are going to
use our middleware, and we pass in the git_sha that we found above.
use Middleware::VersionIdentifier, git_sha
We start up our application and voilà, we have a version identifier getting returned on every response with the header of
$ curl -i "localhost:9292" HTTP/1.1 200 OK version-id: 7df6e26a3dd1d3d05130e054fbcb1b878d965767 Content-Length: 7 Howdy! $
The full source code can be found at https://github.com/stevenproctor/versionid-middleware-example.