Category Archives: Unix

Check Your Git Commits for the Year

It’s that time of the year, when you hit the last few weeks of the year and wonder, what did I manage to do this year?

This may be on your own accord of reflecting on the past year, it may be because you have your annual performance review, or maybe even wanting some ammo to use for negotiating a new raise or promotion.

Either way, if you use Git, you can use that journal log of your work to help trigger memories of what you did for the past year.

We will take a look at how to build this up to be generic, so that it can be run any time of any year, across any of your Git branches.

First, we want to find commits that we were the author of. As we can have our author name be different across Git repos we want to look at who we are according to that repo based on our Git config settings.

git config --get user.name
# Proctor

We will put that in a Bash function for nice naming and ease of use later on.

function my_git_user_name() {
  git config --get user.name
}

We also want to know what year it is, so we can look at commits for this year.

We will use the date command to get the current year.

date +'%Y'
# 2015

Again, we will create a Bash function for that as well.

function this_year() {
  date +'%Y'
}

We also want to know what last year was, so we can know the beginning of this year. We bust out bc for this to do some calculation at the command line. We take the current year – 1 and pass that to bc to get last year.

echo "$(this_year)-1" | bc
# 2014

Wrap that in a function.

function last_year() {
  echo "$(this_year)-1" | bc
}

And now we can get the end of last year, being December 31st.

echo "$(last_year)-12-31"
# 2014-12-31

And of course, we put that into another function.

function end_of_last_year() {
  echo "$(last_year)-12-31"
}

And now we can use both end_of_last_year and my_git_user_name to find the Git commits I was the author of since the beginning of the year.

git log --author="$(my_git_user_name)" --after="$(end_of_last_year)" origin/master

Note that this checks against ‘origin/masterso if you call (one of) your canonical remote(s) something other thanorigin` you will need to update this, but this will show all those items that have made it into master that you have worked on.

And because of convenience, we will put this in a function, so we can call in nice and easy.

function my_commits_for_this_past_year()
{
  git log --author="$(my_git_user_name)" --after="$(end_of_last_year)" origin/master
}

And to call it we just need to type “ at the command line.

Having these functions, it also allows us to add it to our Git aliases or .bash_profile so we can have easy access to call it from anywhere.

### What did I do in git this past year?

function this_year() {
  date +'%Y'
}

function last_year() {
  echo "$(this_year)-1" | bc
}

function end_of_last_year() {
  echo "$(last_year)-12-31"
}

function my_git_user_name() {
  git config --get user.name
}

function my_commits_for_this_past_year()
{
  git log --author="$(my_git_user_name)" --after="$(end_of_last_year)" origin/master
}

This makes it nice and easy to filter your Git commits and trace through your history on a project, and refresh your memory of what you have actually touched, instead of misremembering what year it was done, or forgetting about that small little fix that wound up having a big impact.

–Proctor

A List of Questions to Address Before Creating a Microservice

Last Friday, I got a meeting invite at work to discuss creating a new microservice as part of our application.

Whether it is Unix style programs, Domain Driven Design bounded contexts, classes and objects that adhere to the Single Responsibility Principle, I have been a supporter of small focused applications that have a single job to do, and do it well. This is one of the things that has appealed to me about Erlang as I have been digging ever deeper into it.

I will be first to step forward and promote the idea of microservices, but I will also be the first to come across like I don’t support them.

These questions are to make sure that proper thought is given to the implications of creating a microservice architecture, so that we don’t shoot ourselves in the foot and become the case study of why microservices are just a bunch of hot air, instead of being a case study for why and how it can work.

These questions are likely applicable to any new application, and not just microservices, and are inspired by the 8 Fallacies of Distributed Computing, the book Enterprise Integration Patterns, Domain Driven Design, my learning path with Erlang, and too many more to be named.

In no particular order at this point, but the general order at which they came into my head, here are the questions we should ask ourselves to help determine if a microservice is a good idea.

What other information do this service need from other parts of the system? Is this truly a vertical slice of a domain?
What other outside systems do we depend on for this service?
What happens if one of the services dependencies is unavailable?
How do we know if this service is running? Generating errors?
Which parts of the system will be consuming the service?
How to we abort without taking down the consumers of this service?
What does the size of the request look like?
What does the size of the response look like?
What is the latency of this service?
- What is the latency of just returning a 200 OK with a hardcoded return value.
- What is the expected latency of processing a full request?
What is the expected SLA of the service?
- How do we expect to meet that SLA?
- What is the SLA for uptime?
- What is the SLA for response time? Average response time? 95th percentile? 99th percentile? Worst case?
What is our default response to return if we are about to break the SLA?
Are we expecting this service to be exposed to the outside world? Live within an isolated network?
Do we need authentication?
Who would be authorized to consume this service?
How are we expecting to manage access to this service?
Do we need to encrypt the data exchange?
What internal storage/persistance mechanism(s) do we need as part of this service to keep it isolated?
How many Requests per Second are we expecting this service to need to serve?
How do we expect this service to be deployed? What deployment dependencies are we expecting to need?
How frequently do we expect this service to need to be updated after deployment?
How many instances of this service do we think we will need to have running?
How do we coordinate information exchange between multiple instances of the service?
What is the expected time between a change notification and a consistent view of the system?
If any one instance of the service in a cluster fails, do the rest fail?
- How do we keep the other instances from failing?
- How does an instance of the service catch back up to the latest state once it has recovered?
If part of the service cluster fails, can we safely and automatically restart that part of the cluster?
How many failures in a time period do we allow before escalating a larger issue?
- What is that time period?
- How do we escalate issues?
How do we expect these larger issues to be addressed?
What does it take to start the service from an empty slate?
What does it take to stop the service?
Can we have multiple versions of the service deployed and serving requests at the same time?
How do we know what instance of the service served a request?
What is the strategy to resolve the service endpoint from a blank
What is the expected communication medium/protocol/payload we expect to be using to communicate with this service?
- Message bus channel subscriptions? HTTP requests? REST “proper” with Hypermedia? “Dumb” REST? JSON payload? XML payload? Protobuff payloads?
How do we expect load to be distributed between any instances of this service?
When making a request to an outside service, what do we do when awaiting a response?
- Block? Start processing another request? Do something that is not I/O based?
How are we expecting to manage versioning of the APIs that this service is expect to provide?
Does this service need to respond to incoming calls/notifications?
If this service does need to respond, it is expected to be synchronous, “appear” synchronous, or be completely asynchronous style of response?
If asynchronous responses are expected, how does the service get the information it needs to know to where to send the response to?
How do we expect to trace a flow between work and the requests and responses that triggered that work? Is there a way to trace causality?
What is the minimum infrastructure/frameworks that is needed to provide the service?
- Is this a service? Microservice? Additional monolithic application?
What is the problem domain (bounded context) of this service?
- How do we know when we are adding features that should belong in other services?
How many requests are we expecting are needed to complete a business use case?
- Is there any way to shrink that number? Can requests be combined?

This is by no means a complete list of questions we should be asking ourselves, but the start of a conversation to understand the scope of what it takes for a new service to be created and deployed. These are my brain dump of questions that help a team know if they know how to swim, and how deep the water is, before diving head first into the sea of microservices.

Let me know what other questions you think are missing.

–Proctor

Migrating to a Git Deployment User

At work we have a couple of different user account that a number of older applications are deployed under. These different users each have full rights to the various repositories that run under that user account. For the sake of example, we will say the user’s account is alice.

As anybody who is familiar with making changes in a work environment, you can’t always just stop the world to make updates, but have to take baby steps, to get to your end goal. Ideally we would like the app to be deployed under dan, a deployment user with “read-only” permissions to the git repository. (I say read-only, even though there is no such thing in git, but the point is we don’t want the dan user to be able to push back to the source-of-truth repository, and shouldn’t really be making changes on the box to begin with.)

There are a couple of different git repositories that run under alice, but for this example we are going to be working to migrate the repository etudes_for_erlang [1] to be fetched as the deploy user dan.

I am assuming you have already done the work of setting up access control list policy, either through server-side hooks, or github permissions, and we will be focusing on changing the way we pull the repositories down under alice, to look like she is dan.

First step will be to create a new ssh key for “dan”, and get the new public ssh key added as an authorized key for dan on the git server. We will refer to the public key as dan_rsa.

As alice, edit her ssh config file found at ~alice/.ssh/config. We will be adding two new entries to that config. The first entry is to allow alice to connect to the remote server, in this case, github.com, as herself.

Host github.com
HostName github.com
IdentityFile ~/.ssh/id_rsa

We use alice’s normal ssh key, id_rsa, and just connect to github.com as normal. We specify that when we refer to github.com by itself, we are going to use alice’s id_rsa key to connect to the actual host github.com. This allows alice to fetch, push, and all the other good stuff for those repositories under her that we are not yet deciding to convert over to the deployment dan account.

We also add another Host entry for working as dan.

Host deploy.github.com
HostName github.com
IdentityFile ~/.ssh/dan_rsa

In this case, we specify that the host is deploy.github.com. What this does is that when we refer to that deploy.github.com we want to connect to github.com but use the ssh key for dan, by specifying the identity file as dan_rsa.

At this point you should be able to ssh into both github.com and deploy.github.com successfully, and github should be identifying you as the correct user:

ssh -T git@github.com
# Attempt to SSH in to github
# Hi alice! You've successfully authenticated, but GitHub does not provide
# shell access.
ssh -T git@deploy.github.com
# Attempt to SSH in to github
# Hi dan! You've successfully authenticated, but GitHub does not provide
# shell access.

We then go into the etudes_for_erlang repository and issue the following commands:

git remote set-url origin git@deploy.github.com:dfw-erlang/etudes_for_erlang.git
git remote set-url --push origin "DO NOT PUSH!!!"

We set the origin url to connect to deploy.github.com instead of github.com, so that we will be connecting as dan.

The second git remote set-url command is a trick to set the push url to something invalid, in this case the string DO NOT PUSH!!!, so that when we try to push we get an error saying we could not connect to the remote repository “DO NOT PUSH!!!”, and that helps to tell us that we should not be pushing back to the source repository.

There you have it, the first steps towards migrating git repositories to be accessed as a deployment user.

If you would like to find out more about the tricks used in the ssh config file, make sure to check out the man page for ssh_config.

Hope you find this useful,
–Proctor

Footnotes:
[1] This was the resulting code of going through the book Études for Erlang as part of the DFW Erlang User Group. back

cronolog and STDERR

At work we use cronolog for automatic rotation of log files for a number of processes since those processes just write to STDOUT and STDERR instead of using a proper logging library. Unfortunately, that means when running the script/program we have to redirect STDERR to STDOUT, and then pipe the results to cronolog, since cronolog reads from STDIN. The result looks something along the lines of the following:

ruby main.rb &2>1 | cronolog /logs/main.log /logs/main-%Y-%m-%d.log

The problem with this is if that errors are few and far between, as one hopes they should be, then it might be really tricky to find the errors amongst the other logging. Ideally, I thought it would be nice to have STDOUT go to one log file, and STDERR get written to a err file for the process.

After some digging into From Bash to Z Shell I found something about process substitution in the Bash shell. After a little experimentation and tweaking, I came up with the following:

ruby main.rb 
     > >(/usr/sbin/cronolog /logs/main.log /logs/main-%Y-%m-%d.log) 
     2> >(/usr/sbin/cronolog /logs/main.err /logs/main-%Y-%m-%d.err)

This allows me to use cronolog with both the STDOUT and STDERR streams. By using cronolog in the process substitution, it allows the output streams to be treated as input streams to cronolog, where as before I had to combine them into one stream and then pipe the single stream to cronolog as in the first example.

Hope this can help someone else, and save some hours of digging.

–Proctor

Log File parsing with Futures in Clojure

As the follow up to my post Running Clojure shell scripts in *nix enviornments, here is how I implemented an example using futures to parse lines read in from standard in as if the input was piped from a tail and writing out the result of parsing the line to standard out.

First due to wanting to run this a script from the command line I add this a the first line of the script:

 
!/usr/bin/env lein exec

As well, I will also be wanting to use the join function from the clojure.string namespace.

 
(use '[clojure.string :only (join)])

When dealing with futures I knew I would need an agent to adapt standard out.

(def out (agent *out*))

I also wanted to separate each line by a new line so I created a function writeln. The function takes a Java Writer and calls write and flush on each line passed in to the function:

(defn writeln [^java.io.Writer w line]
  (doto w
    (.write (str line "n"))
    .flush))

Next I have my function to analyze the line, as well as sending the result of that function to the agent via the send-off function.

(defn analyze-line [line]
  (str line "   " (join "  " (map #(join ":" %) (sort-by val > (frequencies line))))))

(defn process-line [line]
  (send-off out writeln (analyze-line line)))

The analyze-line function is just some sample code to return a string of the line and the frequencies of each character in the line passed in. The process-line function takes a line and calls send-off to the agent out for the function writeln with the results of calling the function analyze-line.

With all of these functions defined I now need to just loop continuously and process lines that are not empty, and call process-line for each line as a future.

(loop []
  (let [line (read-line)]
    (when line
      (future (process-line line)))
      (recur)))

Running Clojure shell scripts in *nix environments

I was recently trying to create a basic piece of Clojure code to play with “real-time” log file parsing by playing with futures. The longer term goal of the experiment is to be able to tail -f a log file pipe that into my Clojure log parser as input.

As I wasn’t sure exactly what I would need to be doing, I wanted an easy way to run some code quickly without having to rebuild the jars through Leiningen every time I wanted to try something, in a manner similar to the way I am thinking I will be using it if the experiment succeeds.

I created a file test_input with the following lines:

1 hello
2 test
3 abacus
4 qwerty
5 what
6 dvorak

With this in place, my goal was to be able to run something like cat test_file | parser_concept. After a bit of searching I found the lein-exec plugin for Leiningen, and after very minor setup I was able to start iterating with input piped in from elsewhere.

The first step was to open my profiles.clj file in my ~/.lein directory. I made sure lein-exec was specified in my user plugins as so:

{:user {:plugins [[lein-exec "0.2.1"]
                  ;other plugins for lein
                 ]}}

With this in place I just put the following line at the top of my script.clj file:

#!/usr/bin/env lein exec

I then changed the permissions of script.clj file to make it executable, I was able to run the following and have my code run against the input.

cat test_input | ./script.clj

I will be posting a follow up entry outlining my next step of experimenting with “processing” each line read in as a future.