Lumberjack – lumberjack.nginx (version 0.1.0)

As I posted last time, lumberjack is my start of a log line analyzer/visualizer project in Clojure. This write up will cover the version 0.1.0 lumberjack.nginx namespace.

As this is a version 0.1.0, and to get it out, I am parsing Nginx log lines that take the following format, as all the log lines that I have been needing to parse match it.

173.252.110.27 - - [18/Mar/2013:15:20:10 -0500] "PUT /logon" 404 1178 "http://shop.github.com/products/octopint-set-of-2" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"

The function nginx-logs takes a sequence of Nginx log filenames to convert to a hash representing a log line by calling process-logfile on each one.

(defn nginx-logs [filenames]
  (mapcat process-logfile filenames))

The function process-logfile takes a single filename and gets the lines from the file using slurp, and then maps over each of the lines using the function parse-line.

(defn- logfile-lines [filename]
  (string/split-lines (slurp filename)))

(defn process-logfile [filename]
    (map parse-line (logfile-lines filename)))

At this point, this is sufficient for what I am needing, but have created an issue on the Github project to address large log files, and the ability to lazily read in the lines so the whole log file does not have to reside in memory.

The function parse-line, holds a regex, and does a match of each line against the pattern. It takes each part of the match and associates to a hash using the different parts of the log entry as a vector of the keywords that represent each part of the regex. This is done by reducing against an empty hash and taking the index of the part into match, the result of re-find.

(def parts [:original
            :ip
            :timestamp
            :request-method
            :request-uri
            :status-code
            :response-size
            :referrer])

(defn parse-line [line]
  (let [parsed-line {}
        pattern #"(d{1,3}.d{1,3}.d{1,3}.d{1,3})? - - [(.*)] "(w+) ([^"]*)" (d{3}) (d+) "([^"]*)".*"
        match (re-find pattern line)]
    (reduce (fn [memo [idx part]]
                (assoc memo part (nth match idx)))
            parsed-line (map-indexed vector parts))))

Looking at this again a few days later, I went and created and issue to pull out the definition of pattern into a different definition, outside of the let, and even the parse-line function. I also want to go back and clean up the parsed-line from the let statement as it does not need to be declared inside the let, but can just pass the empty hash to the reduce. This was setup there before I refactored to a reduce, and was just associating keys one at a time to the index of matched as I was adding parts of the log entry.

Any comments on this are welcome, and I will be posting details on the other files soon as well.

Thanks,
–Proctor

1 thought on “Lumberjack – lumberjack.nginx (version 0.1.0)

  1. Mayank

    I recommend you either also update in your github repo or link back to these tuts here.
    Else people who land up to your repo will not have detailed explanations.
    Cheers!

Comments are closed.