Working through the Project Euler problems, and with the comment from Arcane Sentiment about filtering based off of a set of primes, I was trying to program the Sieve of Eratosthenes. I also realized that the Sieve becomes a set based problem, so I went about trying to solve it using sets.
My first approach was simply using the difference function given in clojure.set.
(defn primes-under [n] (loop [sieve (set (cons 2 (range 3 n 2))) f 3] (if (> (square f) n) sieve (recur (difference sieve (range (square f) n f)) (inc f)))))
I start with the set of numbers starting by cons-ing 2 onto the range from 3 to n, skiping every other number, to get the odd numbers up to n.
The new sieve becomes the the difference between the current sieve, and the multiples of the factor f. I use (range (square f) n f) as a subset of the factors of f up to the number n. I start the range at the square of the factor, as any multiples below the square of the factor would already have been sieved out from the factors below it. So for multiples of 5, I can ignore (10, 15, 20) as they are multiples of 2 and 3, the other primes below 5, and as a result I only need to get multiples of 5 starting at 25.
When timing this solution by using:
(time (dorun (primes-under 1000000)))
The timing for the primes under 1,000,000 is around 2.1 seconds, and anywhere from 6 to 10 seconds for the primes under 2,000,000.
I decided to try my hand at solving this using a transient set. After many missteps in using a transient set, due to not having used them before, I came up with the following solution:
(defn primes-under2 [n] (let [sieve (transient (set (cons 2 (range 3 n 2))))] (loop [s sieve f 3] (cond (> (square f) n) (persistent! s) :else (recur (reduce disj! s (range (square f) n f)) (inc f))))))
I define sieve as a transient set, and instead of using difference, I call reduce on the sieve using the function disj! for the ranges of numbers to filter out. I need to use disj! to remove that number from the sieve, as this is now a transient set I am working against.
The timing of this solution is now down to 1.1 seconds for primes under 1,000,000, and primarily in the 3 second range for the primes under 2,000,000.
I am still not quite satisfied with this solution, and would love suggestions as to how to avoid those numbers which have been filtered out of the sieve already. As I just do a (inc f), I would still get the numbers 4, 6, 8, 9, 25, etc, that are multiples of other primes, but I could not figure out how to skip these numbers. I tried the functions contains?, get, superset?, subset?, difference, but these were not giving me the result when dealing with the transient set.
If someone could tell me if I have missed anything I should try for filtering out non-prime numbers as factors, I would love to get your suggestions to try it out, and I will post a follow up with the results.