Here is my solution to Problem 14 of Project Euler. As always, my progress you can tracked on GitHub at https://github.com/stevenproctor/project-euler-clojure.

Problem 14 of Project Euler is to find the starting number with the longest Collatz sequence, summarized from the problem page as:

The following iterative sequence is defined for the set of positive integers:
n -> n/2 (n is even)
n -> 3n + 1 (n is odd)
[...]
Which starting number, under one million, produces the longest chain?

The first function I needed was a function to get the next number in the Collatz sequence.

(defn next-collatz [n]
(cond (<= n 1) nil
(even? n) (/ n 2)
(odd? n) (+ (* 3 n) 1)))

This function mimics the above definition of the sequence, with the distinction of the check to see if the number is less than, or equal to, 1. While not strictly necessary, when I was generating numbers from the REPL, I noticed that the sequence never really terminates, but ends in a repeating sequence of `4 -> 2 -> 1 -> 4 -> 2 -> 1 ->...`

. This can be seen by running the following at the REPL.

(take 15 (iterate next-collatz 8))

Wich results in the sequence `(8 4 2 1 4 2 1 4 2 1 4 2 1 4 2)`

. I used the condition `(<= n 1)`

as a guard condition to as a way to be able to terminate the sequence and make it finite. This could probably be removed at this point, as I did not even use the `nil`

return as a check, except that it would serve as a reminder of the function when playing at the REPL.

The next function I defined was a function to get the count of the numbers in the sequence. I defined it this way in the hopes of trying to memoize the values, but I kept getting an OutOfMemoryException when I tried to use it so I took the memoization away.

(defn collatz-count-for [n]
{:pre [(pos? n)]}
(if (= n 1) 1
(inc (collatz-count-for (next-collatz n)))))

In this function, I decided I would add a precondition to the function requiring that the parameter to the function be positive. This way I could avoid having to try and generate a Collatz sequence for negative numbers, as they are outside the scope of this problem. The function first checks if the number to generate the sequence for is 1, and if so, the count of the sequence returns 1. For all other positive numbers, we get the length of the sequence for the next Collatz number in the sequence, and increment that value, and return that as the Collatz count.

As I was writing this up, I decided I would take another stab at this, and did some searching to try and find a way to change the memory allocation for the project in the REPL. I found the following post How to Set JVM Memory for Clojure REPL in Emacs, and added the following to my project.clj file.

:jvm-opts ["-Xmx768M"]

When I ran with the following memoized version of the `collatz-count-for`

, it now runs much quicker than it did before. Previously it was a rough mental counting of about 10-15 seconds, and now the first time is runs was 2-3 seconds, and sub second on runs after that even.

(def collatz-count-for (memoize
(fn [n]
{:pre [(pos? n)]}
(if (= n 1) 1
(inc (collatz-count-for (next-collatz n)))))))

The changes needed to memoize the function `collatz-count-for`

is that the `defn`

is now a `def`

, which assigns the identifier of `collatz-count-for`

to the result of passing a now nameless function, with the previous function body, to the function `memoize`

. This allows the function body to remain the same, and only the declaration of the function has to now change.

With those pieces to the problem in place, we can now find the number that generates the longest Collatz sequence.

(defn problem14
([] (problem14 1000000))
([n] (first (reduce
(fn [memo x]
(if (> (second x) (second memo)) x memo))
(map #(vector % (collatz-count-for %)) (range 1 (inc n)))))))

I started by mapping over the numbers from 1 to 1000000, to create vectors representing the pair of the number and the sequence length for that number. An example of which results in the following sequence (shown to the first 10 numbers):

([1 1] [2 2] [3 8] [4 3] [5 6] [6 9] [7 17] [8 4] [9 20] [10 7] ...)

I then feed that sequence into `reduce`

, with a function to find the pair with the largest sequence length by calling `second`

on the vector, resulting in the pair of number and sequence length with the largest sequence length. And having found that vector, I use first to find what the number with the largest sequence length is.

This seems like it might be generic enough (idiomatic pattern) that I was missing something that was already defined to do the map and reduce on a structure like this, but I could not find anything. If anybody knows of anything I would love to see how that might be made more expressive.

–Proctor