Ruby Tuesday – composing filter

As mentioned in the last Ruby Tuesday on using compose and map, I ended with saying that we would take a look at how we would take advantage of composition for our filter.

As a refresher, here is our Sequence module.

module Sequence

  def self.my_map(f, items)
    do_map = lambda do |accumulator, item|
      accumulator.dup << f.call(item)
    end

    my_reduce([], do_map, items)
  end

  def self.my_filter(predicate, items)
    do_filter = lambda do |accumulator, item|
      if (predicate.call(item))
        accumulator.dup << item
      else
        accumulator
      end
    end

    my_reduce([], do_filter, items)
  end

  def self.my_reduce(initial, operation, items)
    return nil unless items.any?{|item| true}

    accumulator = initial
    for item in items do
      accumulator = operation.call(accumulator, item)
    end
    accumulator
  end

  def self.my_compose(functions, initial)
    apply_fn = ->(accum, fn) { fn.(accum) }
    my_reduce(initial, apply_fn, functions)
  end

  @@map = method(:my_map).curry
  @@filter = method(:my_filter).curry
  @@reduce = method(:my_reduce).curry
  @@compose = method(:my_compose).curry

  def self.map
    @@map
  end

  def self.filter
    @@filter
  end

  def self.reduce
    @@reduce
  end

  def self.compose
    @@compose
  end
end

So lets start taking a look at how we would be able to compose our filter lambdas together by seeing if we can find all numbers that are multiples of three, and then multiples of five.

multiple_of_three = ->(x) { x % 3 == 0}
# => #<Proc:0x007faeb28926b0@(pry):1 (lambda)>
multiple_of_five = ->(x) { x % 5 == 0}
# => #<Proc:0x007faeb3866c58@(pry):2 (lambda)>
Sequence.compose.([Sequence.filter.(multiple_of_three),
                   Sequence.filter.(multiple_of_five)]).((1..100))
=> [15, 30, 45, 60, 75, 90]

The catch again is that we are traversing the first list for all 100 elements to find those items that are multiples of 3, and then filtering the filtered list to find only those that are multiples of 5.

What if we could do this in one pass, and compose the predicates together?

And since we want to test this to make sure we get a good result, we will compose those predicates together and just test our composed function with the number 15.

three_and_five = Sequence.compose.([multiple_of_three, multiple_of_five])
=> #<Proc:0x007faeb4905d20 (lambda)>
three_and_five.(15)
# NoMethodError: undefined method `%' for true:TrueClass
# from (pry):2:in `block in __pry__'

It fails with a NoMethodError telling us that we can’t use the method % on an object of type TrueClass.

Don’t we want to get back a true instead of an error though?

If we rollback the compose into it’s nested function call version, it starts to become clear where the error is coming from.

multiple_of_five.(multiple_of_three.(15))
# NoMethodError: undefined method `%' for true:TrueClass
# from (pry):2:in `block in __pry__'

So what is happening is that we get back true from calling multiple_of_three with 15, and we then pass that true into multiple_of_five, when what we wanted to do was to pass in the 15 and make sure it was a multiple of 5 as well.

What we really want is to evaluate every lambda with the value of 15, and then then see if all of the predicate functions succeed in their checks.

So we will start with a very naive, and “un-curried” version, to prove out our concept.

Our first pass will be to map over each predicate and invoke it with the value we want to test resulting in a list of booleans if it succeeded or failed. We then reduce all of those items together via an and operation to get an overall success status.

def my_all_succeed_naive(predicates, value)
  check_results = Sequence.map.(->(f) {f.(value)}, predicates)
  Sequence.reduce.(true, ->(accum, item) {accum && item}, check_results)
end

my_all_succeed_naive([multiple_of_three, multiple_of_five], 15)
# => true
my_all_succeed_naive([multiple_of_three, multiple_of_five], 14)
# => false
my_all_succeed_naive([multiple_of_three, multiple_of_five], 5)
# => false

This looks like this works, as we get 15 is a multiple of 3 and a multiple of 5, but we still check if the item is a multiple of 5, even if our multiple of 3 check failed.

Could we do better, and short circuit our evaluation if we get a false?

Let’s try.

def my_all_succeed(predicates, value)
  for predicate in predicates do
    return false unless predicate.(value)
  end
  true
end

So let’s take a look at the difference between the two, just to make sure we are on the right track.

First we will create a “long running predicate check” to add to our chain.

[22] pry(main)> pass_after = ->(x, value) { sleep(x); true }.curry
=> #<Proc:0x007faeb310d498 (lambda)>

Then we will time it using Benchmark#measure (and don’t forget to require 'benchmark' first though). First with a success, and then with a failure against the first predicate.

Benchmark.measure do
  my_all_succeed([multiple_of_three, pass_after.(3), multiple_of_five], 15)
end
# => #<Benchmark::Tms:0x007faeb28f24c0
#  @cstime=0.0,
#  @cutime=0.0,
#  @label="",
#  @real=3.0046792929642834,
#  @stime=0.0,
#  @total=0.0,
#  @utime=0.0>


Benchmark.measure do
  my_all_succeed([multiple_of_three, pass_after.(3), multiple_of_five], 14)
end
# => #<Benchmark::Tms:0x007faeb6018c88
#  @cstime=0.0,
#  @cutime=0.0,
#  @label="",
#  @real=2.5073997676372528e-05,
#  @stime=0.0,
#  @total=0.0,
#  @utime=0.0>

We can see that it takes three seconds to run if we succeed, but only a fraction of a second if we fail on the first check. And just to make sure our earlier assumptions were correct, we will benchmark the same predicate list against the naive version with the value of 14, so it will fail on the first check of a multiple of three.

Benchmark.measure do
  my_all_succeed_naive([multiple_of_three, pass_after.(3), multiple_of_five], 14)
end
# => #<Benchmark::Tms:0x007faeb487d218
#  @cstime=0.0,
#  @cutime=0.0,
#  @label="",
#  @real=3.0028793679666705,
#  @stime=0.0,
#  @total=0.0,
#  @utime=0.0>

And it does indeed take just over 3 seconds to complete.

So let’s add this to our Sequence module, and get it to be able to be curried.

So what if we did that for just a single string, not even in an list of some sort?

def self.my_all_succeed(predicates, value)
  for predicate in predicates do
    return false unless predicate.(value)
  end
  true
end

@@all_succeed = method(:my_all_succeed).curry

def self.all_succeed
  @@all_succeed
end

And we check that we can use it off our Sequence module partially applied.

Sequence.all_succeed.([multiple_of_three, multiple_of_five]).(14)
# => false
Sequence.all_succeed.([multiple_of_three, multiple_of_five]).(15)
# => true

So now we can get back to how we would use this with filter by using our new Sequence::all_succeed.

three_and_five_multiple = Sequence.all_succeed.([multiple_of_three, multiple_of_five])
# => #<Proc:0x007faeb28685e0 (lambda)>
Sequence.filter.(three_and_five_multiple).((1..100))
# => [15, 30, 45, 60, 75, 90]

And there we go, we have now composed our predicates into one function which we can then pass to Sequence::filter and only have to walk through the list once.

As we have seen how to compose predicates together for an “and” style composition, next week we will look at building an “or” style composition of predicates.

–Proctor

Erlang Thursday – ETS selects, continuations, and concurrent inserts

At the end of last week’s Erlang Thursday, I said we would continue looking at the behavior of the select functions in the ets module.

So before we do any experimentation, we setup our test ETS tables, and this time we will also create a table of type ordered_set.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.52.0>
TestOrderedSetTable = ets:new(ordered_set_table, [public, ordered_set]).
% 16402
TestTable = ets:new(ets_table, [public]).
% 20499
ets:give_away(TestTable, SomeProcess, []).
% true
ets:give_away(TestOrderedSetTable, SomeProcess, []).
% true

Next we will load our test ETS table with some dummy data, leaving some gaps in the sequence, allowing us to fill those gaps in later.

[[ets:insert(TestTable, {X, X}) || X <- lists:seq(1, 30, 2)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true]]
[[ets:insert(TestOrderedSetTable, {X, X}) || X <- lists:seq(1, 30, 2)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true]]

We then do a select to get all of the records from the table so we can see how the results are ordered for the different table types.

ets:select(TestTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}]).
% [{15,15},
%  {25,25},
%  {13,13},
%  {21,21},
%  {11,11},
%  {1,1},
%  {23,23},
%  {7,7},
%  {3,3},
%  {9,9},
%  {19,19},
%  {29,29},
%  {27,27},
%  {17,17},
%  {5,5}]
ets:select(TestOrderedSetTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}]).
% [{1,1},
%  {3,3},
%  {5,5},
%  {7,7},
%  {9,9},
%  {11,11},
%  {13,13},
%  {15,15},
%  {17,17},
%  {19,19},
%  {21,21},
%  {23,23},
%  {25,25},
%  {27,27},
%  {29,29}]

The ets module also has a function ets:select_reverse, so let’s take a quick stop and see what that does for our ETS tables.

ets:select_reverse(TestTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}]).
% [{15,15},
%  {25,25},
%  {13,13},
%  {21,21},
%  {11,11},
%  {1,1},
%  {23,23},
%  {7,7},
%  {3,3},
%  {9,9},
%  {19,19},
%  {29,29},
%  {27,27},
%  {17,17},
%  {5,5}]
ets:select_reverse(TestOrderedSetTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}]).
% [{29,29},
%  {27,27},
%  {25,25},
%  {23,23},
%  {21,21},
%  {19,19},
%  {17,17},
%  {15,15},
%  {13,13},
%  {11,11},
%  {9,9},
%  {7,7},
%  {5,5},
%  {3,3},
%  {1,1}]

If we look at the results of ets:select/2 and ets:select_reverse/2, we see that for TestTable we get the same result, and for TestOrderedSetTable we get the results in a reverse order, which is what the documentation for ets:select_reverse/2 states. Which makes sense if you think about it,

With that brief diversion out of the way, lets run our same match_spec()s from above, but limit the results to 5 records so we get a continuation back.

{Result, Continuation} = ets:select(TestTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}], 5).
% {[{19,19},{29,29},{27,27},{17,17},{5,5}],
% {20499,214,5,<<>>,[],0}}
{OrdSetResult, OrdSetContinuation} = ets:select(TestOrderedSetTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}], 5).
% {[{1,1},{3,3},{5,5},{7,7},{9,9}],{16402,9,[],5,<<>>,[],0,0}}

And with those continuations, we will see what the next results we would fetch would be.

ets:select(Continuation).
% {[{1,1},{23,23},{7,7},{3,3},{9,9}],{20499,111,5,<<>>,[],0}}
ets:select(OrdSetContinuation).
% {[{11,11},{13,13},{15,15},{17,17},{19,19}],
%  {16402,19,[],5,<<>>,[],0,0}}

Remember those “gaps” we left in our sequence of numbers we used to create tuples?

Time to “fill in” those gaps of the sequence to see what happens if we fetch with our existing continuation as data gets populated concurrently.

[[ets:insert(TestOrderedSetTable, {X, X}) || X <- lists:seq(2, 30, 2)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true]]
[[ets:insert(TestTable, {X, X}) || X <- lists:seq(2, 30, 2)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true]]

Now we re-run our ets:select/1 functions with the same continuations as before.

ets:select(Continuation).
% {[{12,12},{7,7},{3,3},{10,10},{9,9}],
%  {20499,224,5,<<>>,[],0}}
ets:select(OrdSetContinuation).
% {[{10,10},{11,11},{12,12},{13,13},{14,14}],
%  {16402,14,[],5,<<>>,[],0,0}}

If we compare that to before we can see the we now have even number items in the list. For our TestTable if we look above at the Continuation value itself, we ahve the continuation point as 214, since that is the only thing that has changed between that continuation and the resulting continuations from calling ets:select(Continuation).. So with just a number it is hard to infer just how we might expect the continuation to change.

The OrdSetContinuation on the other hand, has a 9 as its second element in the tuple, after the ETS table id of 16402. This also happens to be the key of the last tuple in the result set, which matches up with the 19 and 14 in the other continuations. So in the case of the ordered set, we can infer that as part of the continuation for an ETS table of type ordered_set, the continuation tells us the specific key of the last record that was returned, and we continue from that record regardless of any concurrent inserts that may have taken place.

Next time we will take a look at ets:is_compiled_ms/1 and how match specs might play in with continuations based off reading the documentation about ets:is_compiled_ms/1.

–Proctor

Ruby Tuesday – compose and map

As mentioned last week, now that we have our compose function we will take a look at some of the properties of we get when using our map and compose functions together.

Here is our Sequence class as we left it with compose added to it.

module Sequence

  def self.my_map(f, items)
    do_map = lambda do |accumulator, item|
      accumulator.dup << f.call(item)
    end

    my_reduce([], do_map, items)
  end

  def self.my_filter(predicate, items)
    do_filter = lambda do |accumulator, item|
      if (predicate.call(item))
        accumulator.dup << item
      else
        accumulator
      end
    end

    my_reduce([], do_filter, items)
  end

  def self.my_reduce(initial, operation, items)
    return nil unless items.any?{|item| true}

    accumulator = initial
    for item in items do
      accumulator = operation.call(accumulator, item)
    end
    accumulator
  end

  def self.my_compose(functions, initial)
    apply_fn = ->(accum, fn) { fn.(accum) }
    my_reduce(initial, apply_fn, functions)
  end

  @@map = method(:my_map).curry
  @@filter = method(:my_filter).curry
  @@reduce = method(:my_reduce).curry
  @@compose = method(:my_compose).curry

  def self.map
    @@map
  end

  def self.filter
    @@filter
  end

  def self.reduce
    @@reduce
  end

  def self.compose
    @@compose
  end
end

Like last week we have a list of names, and our goal is to get the list of “first” names back and have them be capitalized.

names = ["jane doe", "john doe", "arthur dent", "lori lemaris", "DIANA PRINCE"]
# => ["jane doe", "john doe", "arthur dent", "lori lemaris", "DIANA PRINCE"]

So we start with our base functions that we are going to use in our map calls.

split = ->(delimiter, str) {str.split(delimiter)}.curry
# => #<Proc:0x007ffdfa203a00 (lambda)>
whitespace_split = split.(" ")
# => #<Proc:0x007ffdfa8391e0 (lambda)>
first = ->(xs){ xs[0] }.curry
# => #<Proc:0x007ffdfa2a26a0 (lambda)>
capitalize = ->(s) { s.capitalize }
# => #<Proc:0x007ffdfa210ed0@(pry):13 (lambda)>

And we create our different instances of map calls, which are partially applied map functions with the appropriate lambda passed to them.

name_parts = Sequence.map.(whitespace_split)
# => #<Proc:0x007ffdfa1a0248 (lambda)>
firsts = Sequence.map.(first)
# => #<Proc:0x007ffdfa169568 (lambda)>
capitalize_all = Sequence.map.(capitalize)
# => #<Proc:0x007ffdfa86a1f0 (lambda)>

And as we saw last week, we can nest the function calls together,

capitalize_all.(firsts.(name_parts.(names)))
# => ["Jane", "John", "Arthur", "Lori", "Diana"]

or we can use compose to create a “pipeline” of function calls.

capitalized_first_names = Sequence.compose.([name_parts, firsts, capitalize_all])
# => #<Proc:0x007ffdfa1c1c18 (lambda)>
capitalized_first_names.(names)
# => ["Jane", "John", "Arthur", "Lori", "Diana"]

Here’s where things can start to get interesting.

In our capitalized first names example, we go through the list once per transformation we want to apply.

First we transform the list of names into a list of split names, which we transform into a list of only the first items from the source list, and then finally we transform that into a list of the capitalized names.

That could be a lot of processing if we had more transformations, and/or a much longer list. This seems like a lot of work.

Let’s look at it from near the other extreme; the case of if we only had one name in our list.

capitalized_first_names.(['tARa cHAsE'])
=> ["Tara"]

In this case, the fact that we have a list at all is almost incidental.

For a list of only one item, we split the string, get the first element, and capitalize that value.

So what if we did that for just a single string, not even in an list of some sort?

capitalize_first_name = Sequence.compose.([whitespace_split, first, capitalize])
# => #<Proc:0x007ffdfa121100 (lambda)>
capitalize_first_name.("tARa cHAsE")
# => "Tara"

We can compose each of these individual operations together to get a function that will transform a name into the result we want.

And since we have a “transformation” function, we can pass that function to our map function for a given list of names.

Sequence.map.(capitalize_first_name, names)
# => ["Jane", "John", "Arthur", "Lori", "Diana"]

Lo and behold, we get the same results as above for when we composed our map function calls.

Sequence.compose.([Sequence.map.(whitespace_split),
                   Sequence.map.(first),
                   Sequence.map.(capitalize)]
                 ).(names)
# => ["Jane", "John", "Arthur", "Lori", "Diana"]
Sequence.map.(Sequence.compose.([whitespace_split,
                                 first,
                                 capitalize])
             ).(names)
# => ["Jane", "John", "Arthur", "Lori", "Diana"]

This leads us to the property that the composition of the map of function f and the map of function g is equivalent to the map of the composition of functions f and g.

\big(map\ f \circ map\ g\big)\ list = map\big( f \circ g \big) list

Where the circle (\circ) symbol represents the function compose expressed using mathematical notation.

Because of this, we can now only traverse the sequence via map once and do the composed transformation functions on each item as we encounter it without having to go revisit it again.

Next time we will take a look at how we can do the same kind of operation on filter, as we can’t just pipeline the results of one filter on through to another, since filter returns a boolean value which is not what we would want to feed through to the next filter call.

–Proctor

Erlang Thursday – Using ETS select with a limit

In last week’s Erlang Thursday we continued to explore ets:select/2 and seeing its use when combined with using ets:fun2ms to generate the match_spec()s.

This week we will take a look at the other versions of select that the ets module provides.

Yet again we will do our new playground ETS table setup, so if we crash our shell session we don’t loose the table.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.52.0>
TestTable = ets:new(ets_table, [public]).
% 16402
ets:give_away(TestTable, SomeProcess, []).
% true

Next we will load our test ETS table with a bunch of test “products”. For ease of example, we will just use a number for the product id, and a random price ending in .99.

[[ets:insert(TestTable, {ProductId, random:uniform(100) + 0.99})
  || ProductId <- lists:seq(1, 10000) ]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true|...]]

We will create a match_spec() to find items in their twenties (and we will go ahead and round 19.99 up to 20 just because).

ProductsInTheTwenties = ets:fun2ms(fun({Product, Price})
                                     when Price >= 19.99 andalso Price < 30
                                     -> {Product, Price}
                                   end).
% [{{'$1','$2'},
%   [{'andalso',{'>=','$2',19.99},{'<','$2',30}}],
%   [{{'$1','$2'}}]}]

And if we use ets:select/2 against our table with this match spec, we get all of the results back in one query as we saw previously.

ets:select(TestTable, ProductsInTheTwenties).
% [{4351,29.99},
%  {635,19.99},
%  {6005,20.99},
%  {3742,27.99},
%  {5956,29.99},
%  {3753,28.99},
%  {6653,25.99},
%  {5151,28.99},
%  {2693,27.99},
%  {4253,21.99},
%  {7636,23.99},
%  {1935,19.99},
%  {9044,22.99},
%  {7797,22.99},
%  {2147,23.99},
%  {2574,26.99},
%  {7575,29.99},
%  {2130,28.99},
%  {4908,27.99},
%  {2218,22.99},
%  {9848,21.99},
%  {7632,26.99},
%  {3562,21.99},
%  {3130,27.99},
%  {575,26.99},
%  {4622,28.99},
%  {5678,25.99},
%  {4022,...},
%  {...}|...]

But the ets module also gives us a way to limit the results if we would like, using ets:select/3 and giving a limit of the number of results to return at a time.

So let’s use ets:select/3 and give a limit of 10 and see what happens.

ets:select(TestTable, ProductsInTheTwenties, 10).
% {[{9027,27.99},
%   {7347,29.99},
%   {7282,20.99},
%   {9386,24.99},
%   {5415,25.99},
%   {4032,29.99},
%   {8105,25.99},
%   {4634,24.99},
%   {1275,20.99},
%   {234,20.99}],
%  {16402,576,10,<<>>,[],0}}

We get a tuple back instead of a list of results. The first item in the tuple is a list of our first ten results we specified, the second is some bizarre looking tuple, which if we look at the documentation for ets:select/3 represents something referred to as a continuation.

So we run our query again, and bind the results this time.

{Results, Continuation} = ets:select(TestTable, ProductsInTheTwenties, 10).
% {[{9027,27.99},
%   {7347,29.99},
%   {7282,20.99},
%   {9386,24.99},
%   {5415,25.99},
%   {4032,29.99},
%   {8105,25.99},
%   {4634,24.99},
%   {1275,20.99},
%   {234,20.99}],
%  {16402,576,10,<<>>,[],0}}

So we have this continuation, but what is it and what does it mean for us to have it.

In short, it can be thought of as an immutable bookmark. It represents not only what page we are in for our query results, but also the book we are reading (our query).

This allows us to quickly pick up where we previously left off in our results set by passing the continuation to ets:select/1.

ets:select(Continuation).
% {[{2533,24.99},
%   {1357,22.99},
%   {564,21.99},
%   {9086,22.99},
%   {5265,25.99},
%   {4030,22.99},
%   {2802,25.99},
%   {8254,27.99},
%   {7088,26.99},
%   {3062,27.99}],
%  {16402,960,10,<<>>,[{6792,29.99},{9295,29.99}],2}}

And because it is our special immutable bookmark, every time we use that bookmark it takes us to the same starting point in the same book, and we only read the same number of maximum pages as originally set as our limit.

So no matter how many times we call ets:select/1 with our same continuation, we will get the same results each time.

ets:select(Continuation).
% {[{2533,24.99},
%   {1357,22.99},
%   {564,21.99},
%   {9086,22.99},
%   {5265,25.99},
%   {4030,22.99},
%   {2802,25.99},
%   {8254,27.99},
%   {7088,26.99},
%   {3062,27.99}],
%  {16402,960,10,<<>>,[{6792,29.99},{9295,29.99}],2}}
ets:select(Continuation).
% {[{2533,24.99},
%   {1357,22.99},
%   {564,21.99},
%   {9086,22.99},
%   {5265,25.99},
%   {4030,22.99},
%   {2802,25.99},
%   {8254,27.99},
%   {7088,26.99},
%   {3062,27.99}],
%  {16402,960,10,<<>>,[{6792,29.99},{9295,29.99}],2}}
ets:select(Continuation).
% {[{2533,24.99},
%   {1357,22.99},
%   {564,21.99},
%   {9086,22.99},
%   {5265,25.99},
%   {4030,22.99},
%   {2802,25.99},
%   {8254,27.99},
%   {7088,26.99},
%   {3062,27.99}],
%  {16402,960,10,<<>>,[{6792,29.99},{9295,29.99}],2}}

And if we look at the resulting tuple, we see that we get a different tuple for our next continuation.

{SecondResults, SecondContinuation} = ets:select(Continuation).
% {[{2533,24.99},
%   {1357,22.99},
%   {564,21.99},
%   {9086,22.99},
%   {5265,25.99},
%   {4030,22.99},
%   {2802,25.99},
%   {8254,27.99},
%   {7088,26.99},
%   {3062,27.99}],
%  {16402,960,10,<<>>,[{6792,29.99},{9295,29.99}],2}}

And we can pick up that new continuation, and use that in our next call to ets:select/1 to get the next set of results, with another continuation.

ets:select(SecondContinuation).
% {[{8569,19.99},
%   {1805,28.99},
%   {6819,23.99},
%   {9313,28.99},
%   {9527,27.99},
%   {1737,29.99},
%   {700,26.99},
%   {142,25.99},
%   {6792,29.99},
%   {9295,29.99}],
%  {16402,513,10,<<>>,[],0}}

And if we have a query for which we have exhausted our results set, we get an '$end_of_table' atom.

ets:select(TestTable, [{{'$1', '$2'}, [{'<', '$2', 0}], ['$$']}], 10).
% '$end_of_table'

The ability to specify a limit and have a continuation is also available via on match with ets:match/3 and ets:match/1, and match_object via ets:match_object/3 and ets:match_object/1.

Next week, we will continue looking at the various select functions in ets as we look into their behavior with and ordered set, will look at select vs select_reverse, and play with and see how continuations work if we get some new entries inserted in the results when using a continuation.
–Proctor

Ruby Tuesday – Refactoring towards compose

We have seen filter, map, reduce, partial application, and updating the former functions to be able to take advantage of partial application, so how much further can we go?

In this case, we will take a look at how we can chain the functions together to build up bigger building blocks out of their smaller components.

First as a reminder, here is our Sequence class that we have built up over the past few posts.

module Sequence

  def self.my_map(f, items)
    do_map = lambda do |accumulator, item|
      accumulator.dup << f.call(item)
    end

    my_reduce([], do_map, items)
  end

  def self.my_filter(predicate, items)
    do_filter = lambda do |accumulator, item|
      if (predicate.call(item))
        accumulator.dup << item
      else
        accumulator
      end
    end

    my_reduce([], do_filter, items)
  end

  def self.my_reduce(initial, operation, items)
    return nil unless items.any?{|item| true}

    accumulator = initial
    for item in items do
      accumulator = operation.call(accumulator, item)
    end
    accumulator
  end

  @@map = method(:my_map).curry
  @@filter = method(:my_filter).curry
  @@reduce = method(:my_reduce).curry

  def self.map
    @@map
  end

  def self.filter
    @@filter
  end

  def self.reduce
    @@reduce
  end
end

Suppose we want to get a list of capitalized “first” names from a list of names, and we have a bunch of smaller functions that handle the different part of the transformation process that we can reuse.

It might look like the following.

names = ["jane doe", "john doe", "arthur dent", "lori lemaris"]

name_parts_map = Sequence.map.(->(name) {name.split})
# => #<Proc:0x007fea82200638 (lambda)>
first_map = Sequence.map.(->(xs) {xs[0]})
#=> #<Proc:0x007fea82842780 (lambda)>
capitalize_map = Sequence.map.(->(s) {s.capitalize})
# => #<Proc:0x007fea82a05ce8 (lambda)>
initials_map = Sequence.map.(->(strings) {first_map.(strings)})
# => #<Proc:0x007fea82188638 (lambda)>

capitalize_map.(first_map.(name_parts_map.(names)))
# => ["Jane", "John", "Arthur", "Lori"]

And if we wanted to get a list of the initials as a list themselves, we might have something like this.

initials_map.(name_parts_map.(names))
# => [["j", "d"], ["j", "d"], ["a", "d"], ["l", "l"]]

And maybe somewhere else, we need to do some mathematical operations, like transform numbers into one less than their square.

square_map = Sequence.map.(->(i) {i*i})
# => #<Proc:0x007fea82183ef8 (lambda)>
dec_map = Sequence.map.(->(i) {i-1})
# => #<Proc:0x007fea82a37018 (lambda)>
dec_map.(square_map.([1,2,3,4,5]))
# => [0, 3, 8, 15, 24]

And yet another place, we have a calculation to turn Fahrenheit into Celsius.

minus_32 = ->(x) {x-32}
# => #<Proc:0x007fea8298d4c8@(pry):36 (lambda)>
minus_32_map = Sequence.map.(minus_32)
# => #<Proc:0x007fea82955488 (lambda)>
five_ninths = ->(x) {x*5/9}
# => #<Proc:0x007fea83836330@(pry):38 (lambda)>
five_ninths_map = Sequence.map.(five_ninths)
# => #<Proc:0x007fea821429f8 (lambda)>
five_ninths_map.(minus_32_map.([0, 32, 100, 212]))
# => [-18, 0, 37, 100]

Setting aside for the moment, all of the Procs making up other Procs if this is still foreign to you, there is a pattern here that we have been doing in all of these examples to compose a larger function out of a number of smaller functions. The pattern is that we are taking the return result of calling one function with a value, and feeding that into the next function, lather, rinse, and repeat.

Does this sound familiar?

This is our reduce.

We can use reduce to define a function compose that will take a list of functions as our items, and an initial value for our functions, and our function to apply will be to call the function in item against our accumulated value.

That’s a bit of a mouthful, so let’s look at it as code, and we will revisit that statement.

  def self.my_compose(functions, initial)
    apply_fn = ->(accum, fn) { fn.(accum) }
    my_reduce(initial, apply_fn, functions)
  end

  @@compose = method(:my_compose).curry

  def self.compose
    @@compose
  end

Now that we have the code to reference, let’s go back and inspect against what was described above.

First we create a lambda apply_fn, which will be our “reducing” function to apply to the accumulator and each item in the list, which in the case of my_compose is a list of functions to call.

apply_fn like all our “reducing” functions so far takes in an accumulator value, the result of the composed function calls so far, and the current item, which is the function to call. The result for the new accumulator value is the result of applying the function with the accumulator as its argument.

We were able to build yet another function out of our reduce, but this time we operated on a list of functions as our values.

Let that sink in for a while.

So let’s see how we use that.

We will start with creating a composed function to map the Fahrenheit to Celsius conversion and see what different temperatures are in Celsius, including the past few days of highs and lows here at DFW airport.

f_to_c_map = Sequence.compose.([minus_32_map, five_ninths_map])
# => #<Proc:0x007fea82a1f9b8 (lambda)>
f_to_c_map.([0, 32, 100, 212])
# => [-18, 0, 37, 100]
dfw_highs_in_celsius = f_to_c_map.([66, 46, 55, 48, 64, 68])
# => [18, 7, 12, 8, 17, 20]
dfw_lows_in_celsius = f_to_c_map.([35, 27, 29, 35, 45, 40])
# => [1, -3, -2, 1, 7, 4]

And if we take a look at the initials above and compose the map calls together, we get the following.

get_initials_map = Sequence.compose.([name_parts_map, initials_map])
# => #<Proc:0x007fea82108ff0 (lambda)>
get_initials_map.(names)
# => [["j", "d"], ["j", "d"], ["a", "d"], ["l", "l"]]

Doing the same for our capitalized first names we get:

capitalized_first_names_map = Sequence.compose.([name_parts_map, first_map, capitalize_map])
# => #<Proc:0x007fea821d1108 (lambda)>
capitalized_first_names_map.(names)
# => ["Jane", "John", "Arthur", "Lori"]

By having our compose function, we are able to be more explicit that capitalized_first_names_map is, along with the rest of the examples, just a composition of smaller functions that have been assembled together in an data transformation pipeline.

They don’t have any other logic other than being the result of chaining the other functions together to get some intended behavior.

Not only that, but we can now reuse our capitalized_first_names_map mapping function against other lists of names nicely, since we have it able to be partially applied as well.

capitalized_first_names_map.(["bob cratchit", "pete ross", "diana prince", "tara chase"])
# => ["Bob", "Pete", "Diana", "Tara"]

Even better is that compose can work on any function (Proc or lambda) that takes a single argument.

Such as a Fahrenheit to Celsius function that operates against a single value instead of a list.

f_to_c = Sequence.compose.([minus_32, five_ninths])
# => #<Proc:0x007fea8294c4c8 (lambda)>
f_to_c.(212)
# => 100
f_to_c.(32)
# => 0
Sequence.map.(f_to_c, [0, 32, 100, 212])
# => [-18, 0, 37, 100]

Next week well will look at some other properties of our functions and show how compose can potentially help us in those cases as well.

–Proctor

Erlang Thursday – ETS, match_specs, and functions

In last week’s Erlang Thursday I concluded with showing how we can take advantage of using ets:select but take advantage of making our queries more expressive.

First we will need a new ETS table, so we start with a new public “Products” table, and do our standard of creating a new process and giving ownership of the table away.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.52.0>
Products = ets:new(products, [public]).
% 16402
ets:give_away(Products, SomeProcess, []).
% true

Next we will load our “Products” into the table.

In our case, we are just creating a “product” with the “name” as a binary and a “price in CWC” (Common World Currency) as an integer.

[[ ets:insert(Products, {integer_to_binary(X), X}) || X <- lists:seq(1, 100) ]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true|...]]

As we saw last week, we can manually build up a list of tuples into the match_spec() to run our query, say for items less than 10 CWCs.

ets:select(Products, [{{'$1', '$2'}, [{'<', '$2', 10}], ['$1']}]).
% [<<"8">>,<<"6">>,<<"5">>,<<"3">>,<<"7">>,<<"1">>,<<"4">>,
%  <<"9">>,<<"2">>]

We can also find those item names that are more than 10 CWCs and less than 25 CWCS.

ets:select(Products, [{{'$1', '$2'}, [{'>', '$2', 10}, {'<', '$2', 25}], ['$1']}]).
% [<<"11">>,<<"15">>,<<"23">>,<<"20">>,<<"21">>,<<"14">>,
%  <<"12">>,<<"13">>,<<"16">>,<<"19">>,<<"17">>,<<"18">>,
%  <<"22">>,<<"24">>]

But this isn’t necessarily clear, as we are using numerical values for the items in the tuple, and lists of tuples with lists of tuples inside them.

Enter ets:fun2ms/1 to the rescue.

ets:fun2ms/1 will take a function and will turn that function into a match_spec().

This allows us to write a function that takes a tuple of Product and Cost, and will return the Product if the Cost is less than 10.

ets:fun2ms(fun({Product, Cost}) when Cost < 10 -> Product end).
% [{{'$1','$2'},[{'<','$2',10}],['$1']}]

We can also have compound checks in our guard clauses on the functions we pass to ets:fun2ms/1, such as and clauses,

Between_25_And_35_CWC = ets:fun2ms(fun({Product, Cost}) when Cost > 25, Cost < 35 -> Product end).
% [{{'$1','$2'},[{'>','$2',25},{'<','$2',35}],['$1']}]
ets:select(Products, Between_25_And_35_CWC).
% [<<"30">>,<<"33">>,<<"32">>,<<"29">>,<<"28">>,<<"26">>,
%  <<"34">>,<<"27">>,<<"31">>]

as well as or style clauses.

While this is useful it does have its limitations, as this parse transform on the function, so you can’t use everything that you would be able to with a normal function.

ets:fun2ms(fun({Product, Cost}) when Cost > 90 -> lists:reverse(binary:bin_to_list(Product)) end).
# Error: Unknown error code {122,lists,reverse}
# {error,transform_error}

But then again, the results part of the match_spec(), doesn’t support advanced functionality anyways.

ets:select(Products, [{{'$1', '$2'}, [{'<', 90, '$2'}], [binary:bin_to_list('$1')]}]).
# ** exception error: bad argument
#      in function  binary:bin_to_list/1
#         called as binary:bin_to_list('$1')

But even with its limitations, ets:fun2ms/1 still does a good job to help make our ETS queries more expressive. Not only can we reference a function with expressive variable names over just $X, as well as give guard clauses instead of just guard tuples, but we can also use those variable names in our results as well, and do the basic formatting as part of the function.

And make sure to check back in next week as we continue with looking at the different versions of ets:select.

–Proctor

Ruby Tuesday – Partial Application of map, filter, and reduce

Now that we have covered how to get to a basic implementation of map, filter, and reduce in Ruby, as well as how to take advantage of Method#curry, we are going to see how we can get some extra power from our code by combining their use.

In the Ruby versions of Enumerables map, reduce, and select, it operates against a specific object, such as an array of users.

class User
  def initialize(name:, active:)
    @name = name
    @active = active
  end

  def active?
    @active
  end

  def name
    @name
  end
end

users = [User.new(name: "johnny b. goode", active: true),
         User.new(name: "jasmine", active: true),
         User.new(name: "peter piper", active: false),
         User.new(name: "mary", active: true),
         User.new(name: "elizabeth", active: true),
         User.new(name: "jennifer", active: false)]


users.map{|user| user.name}
# => ["johnny b. goode", "jasmine", "peter piper", "mary", "elizabeth", "jennifer"]

users.select{|user| user.active?}
# => [#<User:0x007fa37a13eb68 @active=true, @name="johnny b. goode">,
#  #<User:0x007fa37a13eaa0 @active=true, @name="jasmine">,
#  #<User:0x007fa37a13e910 @active=true, @name="mary">,
#  #<User:0x007fa37a13e848 @active=true, @name="elizabeth">]

If we want the names of a different collection, we need to call map (and use the same block) directly on that collection as well; like a if we had a collection of active Users.

users.select{|user| user.active?}.map{|user| user.name}
=> ["johnny b. goode", "jasmine", "mary", "elizabeth"]

This can be made a little more generic by having the methods get_user_names and get_active_users defined on User, but this still leaves us a bit shallow, so let’s see what else we can do base of what we have seen so far.

We will try it with our versions of map, filter, and reduce, and see how we can distill some of this logic, and raise the level of abstraction hight to make it more generic.

module Sequence
  def self.my_map(f, items)
    do_map = lambda do |accumulator, item|
      accumulator.dup << f.call(item)
    end

    my_reduce([], do_map, items)
  end

  def self.my_filter(predicate, items)
    do_filter = lambda do |accumulator, item|
      if (predicate.call(item))
        accumulator.dup << item
      else
        accumulator
      end
    end

    my_reduce([], do_filter, items)
  end

  def self.my_reduce(initial, operation, items)
    return nil unless items.any?{|item| true}

    accumulator = initial
    for item in items do
      accumulator = operation.call(accumulator, item)
    end
    accumulator
  end
end

And we look at how we call it using our Sequence module defined above.

Sequence.my_map(->(user) {user.name}, users)
# => ["johnny b. goode", "jasmine", "peter piper", "mary", "elizabeth", "jennifer"]
Sequence.my_filter(->(user) {user.active?}, users)
# => [#<User:0x007fa37a13eb68 @active=true, @name="johnny b. goode">,
#  #<User:0x007fa37a13eaa0 @active=true, @name="jasmine">,
#  #<User:0x007fa37a13e910 @active=true, @name="mary">,
#  #<User:0x007fa37a13e848 @active=true, @name="elizabeth">]

Granted, at this point, this is not a great improvement, if any, on it’s own.

BUT….

Since we take the collection to operate on as the last argument to our methods, we can combine our versions of my_map, my_filter, my_reduce with partial application to get a Proc that will do a specific operation against any Enumerable.

Let’s see how this would work.

First, we will update our Sequence module to have a map, filter, and reduce that can be partially applied.

module Sequence

  def self.my_map(f, items)
    do_map = lambda do |accumulator, item|
      accumulator.dup << f.call(item)
    end

    my_reduce([], do_map, items)
  end

  def self.my_filter(predicate, items)
    do_filter = lambda do |accumulator, item|
      if (predicate.call(item))
        accumulator.dup << item
      else
        accumulator
      end
    end

    my_reduce([], do_filter, items)
  end

  def self.my_reduce(initial, operation, items)
    return nil unless items.any?{|item| true}

    accumulator = initial
    for item in items do
      accumulator = operation.call(accumulator, item)
    end
    accumulator
  end

  @@map = method(:my_map).curry
  @@filter = method(:my_filter).curry
  @@reduce = method(:my_reduce).curry

  def self.map
    @@map
  end

  def self.filter
    @@filter
  end

  def self.reduce
    @@reduce
  end
end

Next with the ability to partially applied versions of our map, filter, and reduce, we can now save these off to variables that we can invoke later and just pass in the Users enumerable we wish to operate against.

names = Sequence.map.(->(user) {user.name})
# => #<Proc:0x007f9c53155990 (lambda)>
names.(users)
# => ["johnny b. goode", "jasmine", "peter piper", "mary", "elizabeth", "jennifer"]
get_active = Sequence.filter.(->(user) {user.active?})
# => #<Proc:0x007f9c531af3f0 (lambda)>
get_active.(users)
# => [#<User:0x007f9c5224e960 @active=true, @name="johnny b. goode">,
#  #<User:0x007f9c5224e8c0 @active=true, @name="jasmine">,
#  #<User:0x007f9c5224e780 @active=true, @name="mary">,
#  #<User:0x007f9c5224e6e0 @active=true, @name="elizabeth">]

Or if we want to, we can use the Symbol#to_proc so we don’t have to define our lambda for checking if an item is active?.

get_active = Sequence.filter.(:active?.to_proc)
=> #<Proc:0x007f9c531e62d8 (lambda)>
get_active.(users)
=> [#<User:0x007f9c5224e960 @active=true, @name="johnny b. goode">,
 #<User:0x007f9c5224e8c0 @active=true, @name="jasmine">,
 #<User:0x007f9c5224e780 @active=true, @name="mary">,
 #<User:0x007f9c5224e6e0 @active=true, @name="elizabeth">]

And now that we have our partial applied functions, we can also chain our calls together to get the names of active Users.

names.(get_active.(users))
=> ["johnny b. goode", "jasmine", "mary", "elizabeth"]

Not only that, but say we have some collections of Product objects,

class Product
  attr_reader :id, :name, :brand

  def initialize(id:, name:, active:, brand:)
    @id = id
    @name = name
    @active = active
    @brand = brand
  end

  def active?
    @active
  end
end

products = [Product.new(id: 0, name: "Prefect", active: false, brand: "Ford"),
            Product.new(id: 7, name: "SICP", active: true, brand: "MIT Press"),
            Product.new(id: 16, name: "HTDP", active: true, brand: "MIT Press"),
            Product.new(id: 17, name: "MRI", active: true, brand: "Ruby"),
            Product.new(id: 42, name: "HHGTTG", active: true, brand: "HHGTTG"),
            Product.new(id: 53, name: "Windows 3.1", active: false, brand: "Microsoft")]

Session objects,

class Session
  def initialize(name:, duration:)
    @name = name
    @duration = duration
  end

  def name
    @name
  end

  def active?
    @duration < 15
  end
end

sessions = [Session.new(name: "session A", duration: 3),
            Session.new(name: "session A", duration: 30),
            Session.new(name: "session A", duration: 17),
            Session.new(name: "session A", duration: 9),
            Session.new(name: "session A", duration: 1)]

and even SalesLead objects.

class SalesLead
  def initialize(lead:, active:)
    @lead = lead
    @active = active
  end

  def active?
    @active && @lead.active?
  end

  def name
    @lead.name
  end
end

leads = [SalesLead.new(lead: User.new(name: "lead 1", active: true), active: true),
         SalesLead.new(lead: User.new(name: "lead 2", active: true), active: false),
         SalesLead.new(lead: User.new(name: "lead 3", active: false), active: true),
         SalesLead.new(lead: User.new(name: "lead 4", active: true), active: true),
         SalesLead.new(lead: User.new(name: "lead 5", active: true), active: true),
         SalesLead.new(lead: User.new(name: "lead 6", active: false), active: false)]

And because our Product class has the methods name and active?, we can use our names and get_active variables that hold our partially applied Procs against the list of Products,

names.(products)
# => ["Prefect", "SICP", "HTDP", "MRI", "HHGTTG", "Windows 3.1"]
get_active.(products)
# => [#<Product:0x007f9c530b7dd0 @active=true, @brand="MIT Press", @id=7, @name="SICP">,
#  #<Product:0x007f9c530b7ce0 @active=true, @brand="MIT Press", @id=16, @name="HTDP">,
#  #<Product:0x007f9c530b7bf0 @active=true, @brand="Ruby", @id=17, @name="MRI">,
#  #<Product:0x007f9c530b7ad8 @active=true, @brand="HHGTTG", @id=42, @name="HHGTTG">]
names.(get_active.(products))
# => ["SICP", "HTDP", "MRI", "HHGTTG"]

Sessions,

names.(sessions)
# => ["session A", "session A", "session A", "session A", "session A"]
get_active.(sessions)
# => [#<Session:0x007f9c52153560 @duration=3, @name="session A">,
#  #<Session:0x007f9c52152f98 @duration=9, @name="session A">,
#  #<Session:0x007f9c52152ca0 @duration=1, @name="session A">]
names.(get_active.(sessions))
# => ["session A", "session A", "session A"]

and SalesLeads,

names.(leads)
# => ["lead 1", "lead 2", "lead 3", "lead 4", "lead 5", "lead 6"]
get_active.(leads)
# => [#<SalesLead:0x007f9c5212a020
#   @active=true,
#   @lead=#<User:0x007f9c5212a160 @active=true, @name="lead 1">>,
#  #<SalesLead:0x007f9c521292d8
#   @active=true,
#   @lead=#<User:0x007f9c52129468 @active=true, @name="lead 4">>,
#  #<SalesLead:0x007f9c52128e00
#   @active=true,
#   @lead=#<User:0x007f9c52129080 @active=true, @name="lead 5">>]
names.(get_active.(leads))
# => ["lead 1", "lead 4", "lead 5"]

With this in mind, we will update the definition of our names and get_active to show that it is not just “users” it operates against, but any item.

names = Sequence.map.(->(x) {x.name})
# => #<Proc:0x007f9c53107f38 (lambda)>
get_active = Sequence.filter.(->(x) {x.active?})
# => #<Proc:0x007f9c530bfcd8 (lambda)>

So with this, we have now been able to take our map and select from Ruby’s Enumerable class that worked on a specific collection only, without being redefined and moved into a method to live on User somewhere, we now have a Proc that is recognized to be applicable to anything that accepts an item of that form, and these can be defined and used anywhere.

–Proctor

Erlang Thursday – More ETS data matching (and querying)

In today’s Erlang Thursday we continue from last week in looking at getting data from ETS.

To refresh, we have a module markov_words, and for this week we have added a new function markov_words:create_word_triples/1.

-module(markov_words).

-export([create_word_pairs/1,
         create_word_triples/1]).

-spec create_word_pairs(string()) -> list({string(), string()}).
create_word_pairs(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_pairs([], Words).

-spec create_word_triples(string()) -> list({string(), string(), string()}).
create_word_triples(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_triples(Words, []).


create_word_pairs(WordPairs, [_Word|[]]) ->
    WordPairs;
create_word_pairs(WordPairs, [Word|Words]) ->
    [Following|_] = Words,
    UpdatedWordPairs = [{Word, Following} | WordPairs],
    create_word_pairs(UpdatedWordPairs, Words).


create_word_triples([_Word, _SecondWord | []], WordTriples) ->
    WordTriples;
create_word_triples([FirstWord | Words], WordTriples) ->
    [SecondWord, Following | _] = Words,
    UpdatedWordTriples = [{FirstWord, SecondWord, Following} | WordTriples],
    create_word_triples(Words, UpdatedWordTriples).

The excuse for having this new function is that it would allow us to get more refined Markov chains by picking the probability of the next word by having the state be the compound key of the last two words seen.

With our module updated and defined, we get back to our Erlang shell to test things out, by compiling our module and loading up our intro text into a variable.

c(markov_words).
% {ok,markov_words}

ToTC = "It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of Light, it was the season of Darkness,
it was the spring of hope, it was the winter of despair,
we had everything before us, we had nothing before us,
we were all going direct to Heaven,
we were all going direct the other way--in short,
the period was so far like the present period,
that some of its noisiest authorities insisted on its
being received, for good or for evil, in the superlative
degree of comparison only.

There were a king with a large jaw and a queen with a
plain face, on the throne of England; there were a king
with a large jaw and a queen with a fair face,
on the throne of France. In both countries it was
clearer than crystal to the lords of the State preserves
of loaves and fishes, that things in general were
settled for ever.".

We create our fresh ETS table for this week, create a new process to own it, and give it away (in case we type something wrong and cause the current session of the shell to crash).

MarkovWords = ets:new(markov_word_tuples, [public, duplicate_bag]).
% 16402
Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.58.0>
ets:give_away(MarkovWords, SomeProcess, []).
% true

This week, in addition to adding our word pair tuples to ETS, we will also add in our new word triple tuples to ETS in the same table.

[[ ets:insert(MarkovWords, WordPair) || WordPair <- markov_words:create_word_pairs(ToTC)]].

[[ ets:insert(MarkovWords, WordTriple) || WordTriple <- markov_words:create_word_triples(ToTC)]].

Since we have both word pairs and word triples in the same ETS table, we can see that with ets:match_object/2, we can specify a match_pattern() for only the two tuples

ets:match_object(MarkovWords, {"of", '$1'}).
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"}]

or a match_pattern() that will only match the three tuples.

ets:match_object(MarkovWords, {"of", '$1', '$2'}).
% [{"of","loaves","and"},
%  {"of","the","State"},
%  {"of","France.","In"},
%  {"of","England;","there"},
%  {"of","comparison","only."},
%  {"of","its","noisiest"},
%  {"of","despair,","we"},
%  {"of","hope,","it"},
%  {"of","Darkness,","it"},
%  {"of","Light,","it"},
%  {"of","incredulity,","it"},
%  {"of","belief,","it"},
%  {"of","foolishness,","it"},
%  {"of","wisdom,","it"},
%  {"of","times,","it"},
%  {"of","times,","it"}]

Where as if we use the ets:lookup/2 with the key, we get all items with the key, regardless of the tuple size.

ets:lookup(MarkovWords, "of").
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"},
%  {"of","loaves","and"},
%  {"of","the","State"},
%  {"of","France.","In"},
%  {"of","England;","there"},
%  {"of","comparison","only."},
%  {"of","its","noisiest"},
%  {"of","despair,","we"},
%  {"of","hope,","it"},
%  {"of","Darkness,","it"},
%  {"of","Light,",[...]},
%  {"of",[...],...},
%  {[...],...},
%  {...}|...]

And unlike ets:lookup/2, with ets:match_object/2 we can match on any tuple element, and not just the key.

ets:match_object(MarkovWords, {'$1', "the", '$2'}).
% [{"on","the","throne"},
%  {"on","the","throne"},
%  {"direct","the","other"},
%  {"short,","the","period"},
%  {"like","the","present"},
%  {"of","the","State"},
%  {"to","the","lords"},
%  {"in","the","superlative"},
%  {"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the","season"},
%  {"was","the","season"},
%  {"was","the","epoch"},
%  {"was","the","epoch"},
%  {"was","the","age"},
%  {"was","the","age"},
%  {"was","the","worst"},
%  {"was","the","best"}]

And like ets:match_object/2, ets:match/2 can match based off the tuple itself as well.

ets:match(MarkovWords, {"was", "the", '$1'}).
% [["winter"],
%  ["spring"],
%  ["season"],
%  ["season"],
%  ["epoch"],
%  ["epoch"],
%  ["age"],
%  ["age"],
%  ["worst"],
%  ["best"]]

But sometimes we might want finer grain control over how the results are given back to us, such as a single list of items instead of a nested list of strings. Or maybe we even have some criteria that we want to hold true as part of our selections on the data.

Enter ets:select/2.

ets:select/2 takes the table as its first argument, and a match_spec() as its second argument.

The match_spec() is a list of three-tuples, where the first element is the match pattern, second element is a list of guard clause tuples, and the last element is the result is a term representation of the result for each match.

If we want to call ets:select/2 and have it align with ets:match/2 our call looks like the following.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], [['$1']]}]).
% [["winter"],
%  ["spring"],
%  ["season"],
%  ["season"],
%  ["epoch"],
%  ["epoch"],
%  ["age"],
%  ["age"],
%  ["worst"],
%  ["best"]]

The second argument is a list of match_spec()s, of which there is only one which consists of:
1). a match_pattern() of {"was", "the", '$1'}, which is the same thing we gave to ets:match/2
2). [], and empty list of guard condition tuples, and
3). [[‘$1’]] for the result term, which is the list of terms we want the result formatted as, in this case we want each result to be in its own list.

If we just wanted to get the word themselves as a list, we can update the result term part of the match_spec() to be ['$1'] instead.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], ['$1']}]).
% ["winter","spring","season","season","epoch","epoch","age",
%  "age","worst","best"]

If we wanted something that looked more like a ets:match_object/2 result set we can use the result term of '$_', which signifies the whole object.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], ['$_']}]).
% [{"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the","season"},
%  {"was","the","season"},
%  {"was","the","epoch"},
%  {"was","the","epoch"},
%  {"was","the","age"},
%  {"was","the","age"},
%  {"was","the","worst"},
%  {"was","the","best"}]

And if we wanted to only match on one of the items, and capture the other items in the tuple, we can use the result of '$$' which returns all of the match variable in a list, ordered by variable number as opposed to position in the match_pattern().

ets:select(MarkovWords, [{{"was", '$1', '$2'}, [], ['$$']}]).
% [["clearer","than"],
%  ["so","far"],
%  ["the","winter"],
%  ["the","spring"],
%  ["the","season"],
%  ["the","season"],
%  ["the","epoch"],
%  ["the","epoch"],
%  ["the","age"],
%  ["the","age"],
%  ["the","worst"],
%  ["the","best"]]

ets:select(MarkovWords, [{{"was", '$2', '$1'}, [], ['$$']}]).
% [["than","clearer"],
%  ["far","so"],
%  ["winter","the"],
%  ["spring","the"],
%  ["season","the"],
%  ["season","the"],
%  ["epoch","the"],
%  ["epoch","the"],
%  ["age","the"],
%  ["age","the"],
%  ["worst","the"],
%  ["best","the"]]

With ets:select/2 we also get the ability to specify multiple match_spec()s. This allows us to find all word triple word triples that have either "of" or "the" as the middle word.

ets:select(MarkovWords, [{{'$1', "the", '$2'}, [], ['$_']}, {{'$1', "of", '$2'}, [], ['$_']}]).
% [{"some","of","its"},
%  {"on","the","throne"},
%  {"on","the","throne"},
%  {"direct","the","other"},
%  {"preserves","of","loaves"},
%  {"throne","of","France."},
%  {"throne","of","England;"},
%  {"worst","of","times,"},
%  {"short,","the","period"},
%  {"winter","of","despair,"},
%  {"degree","of","comparison"},
%  {"epoch","of","incredulity,"},
%  {"epoch","of","belief,"},
%  {"spring","of","hope,"},
%  {"like","the","present"},
%  {"of","the","State"},
%  {"age","of","foolishness,"},
%  {"age","of","wisdom,"},
%  {"best","of","times,"},
%  {"season","of","Darkness,"},
%  {"season","of","Light,"},
%  {"to","the","lords"},
%  {"in","the","superlative"},
%  {"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the",[...]},
%  {"was",[...],...},
%  {[...],...},
%  {...}|...]

And with guard clauses, we can find third item in the three-tuples that start with "was", that comes later in the dictionary than the word in the second position of the tuple.

ets:select(MarkovWords, [{{"was", '$1', '$2'}, [{'<', '$1', '$2'}], ['$2']}]).
% ["than","winter","worst"]

So with this week’s post we have seen other ways of using ets:match/2 and ets:match_object/2, and what they can get over using just a ets:lookup/2 for a key, as well as being able to take advantage of even more powerful querying by using ets:select/2.

Next week, we will look at more ways to use ets:select/2, and how we can use some other ets module functions to help create queries that can be easier to deconstruct at a quicker glance.

–Proctor

Ruby Tuesday – Partial Application

As we continue with the theme we have been pursuing in the last couple of posts, we take a brief pit stop and look at partial application before we move on to our next method we want to define.

Partial application is the ability to provide only a subset of arguments to a function or method, and return a new function/method to be called later that will keep the original context.

We will start our examples with the two methods double and triple.

def double(y)
  2 * y
end

def triple(y)
  3 * y
end

double(3)
# => 6

triple(5)
# => 15

We can think of these methods as defined in terms of a more generic function multiply that we call with a hard coded value.

def multiply(x, y)
  x * y
end

def double(y)
  multiply(2, y)
end

def triple(y)
  multiply(3, y)
end

What partial application allows us to do is to define double and triple in terms of multiply, by calling it with the first argument only, the 2 for the method double, and saving the resulting function to be invoked later.

To do this in Ruby we can use Method#curry, or Proc#curry.

Method#curry will return a new Proc that can then be invoked with only a subset of its arguments.

method(:multiply).curry
# => #<Proc:0x007fc02b225950 (lambda)>

So for our double and triple functionality, we can make those be variables which hold the resulting Proc of passing in their value to multiply, and invoke them by only passing in the value we want to double, or triple.

double = method(:multiply).curry.(2)
# => #<Proc:0x007fc02b1eeea0 (lambda)>
double.(8)
# => 16

triple = method(:multiply).curry.(3)
# => #<Proc:0x007fc02b186260 (lambda)>
triple.(17)
# => 51

At this point you might be wondering what this gets you, as the examples of double, triple, and multiply might seem a bit simplistic at best, and maybe even contrived.

I would agree; it is a simple example, but mainly to show as an example of what partial application is, and now we will take a look at our filter, map, and reduce from the previous posts and update them to show some of the power of partial application.

As a reminder this is the map, filter, and reduce as defined previously.

def map(items, do_map)
  reduce([], items, lambda do |accumulator, item|
    accumulator.dup << do_map.call(item)
  end)
end

def filter(items, predicate)
  reduce([], items, lambda do |accumulator, item|
    if (predicate.call(item))
      accumulator.dup << item
    else
      accumulator
    end
  end)
end

def reduce(initial, items, operation)
  return nil if items.empty?

  accumulator = initial
  for item in items do
    accumulator = operation.call(accumulator, item)
  end
  accumulator
end

We will update these definitions to be able to take the items collection last, as that is the most general argument.

def map(f, items)
  do_map = lambda do |accumulator, item|
    accumulator.dup << f.call(item)
  end

  reduce([], do_map, items)
end

def filter(predicate, items)
  do_filter = lambda do |accumulator, item|
    if (predicate.call(item))
      accumulator.dup << item
    else
      accumulator
    end
  end

  reduce([], do_filter, items)
end

def reduce(initial, operation, items)
  return nil unless items.any?{|item| true}

  accumulator = initial
  for item in items do
    accumulator = operation.call(accumulator, item)
  end
  accumulator
end

You might be wondering why I said that the collections of items is the most general argument.

That was because, if we want to sum a sequence of numbers, double a sequence of numbers, or even pick out the even numbers, we can do that against a number of different collections of items.

reduce(0, :+.to_proc, (1..5))
# => 15
reduce(0, :+.to_proc, [2, 4, 6, 8])
# => 20

map(double, (2..7))
# => [4, 6, 8, 10, 12, 14]
map(double, [1, 2, 3, 5, 8, 13])
# => [2, 4, 6, 10, 16, 26]

filter(lambda{|x| x.even?}, (5..10))
# => [6, 8, 10]
filter(lambda{|x| x.even?}, (0..100).step(10))
# => [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

And because we now have the generic items last, we can use Method#curry to build up Procs to call that represent the parts that are the same and give those Procs descriptive names.

concat = method(:reduce).curry.("", :+.to_proc)
# => #<Proc:0x007fc02b207f68 (lambda)>

concat.(("a".."d"))
# => "abcd"
concat.(["alpha", "beta", "gamma", "delta"])
# => "alphabetagammadelta"


sum = method(:reduce).curry.(0, :+.to_proc)
# => #<Proc:0x007fc02b25c450 (lambda)>

sum.((1..10))
# => 55
sum.((2..20).step(2))
# => 110


evens_only = method(:filter).curry.(lambda{|x| x.even?})
# => #<Proc:0x007fc02b155688 (lambda)>

evens_only.([1, 2, 3, 5, 8, 13, 21])
# => [2, 8]
evens_only.([1, 2, 4, 7, 11])
# => [2, 4]


doubles = method(:map).curry.(double)
# => #<Proc:0x007fc02b0c0308 (lambda)>

doubles.((1..10))
# => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
doubles.([2, 4, 6, 8])
# => [4, 8, 12, 16]

And in the last example of doubles we did a curry of map, and used the partially applied function double as the argument to be partially applied to map.

So with partial application, we can start to abstract common behavior out into Procs that can operate against more generic data.

One last example would be filtering items against those that are active. With partial application, we can take filter and create a partially applied version that is given a Proc that will check to see if an item is active?.

active_items = method(:filter).curry.(lambda{|item| item.active?})
# => #<Proc:0x007fc02b207f68 (lambda)>

With this active_items Proc, we can then use that against any collection of objects as long as they support the method active?, e.g. Users, Orders, Sessions, Blog Posts, etc.

active_users = active_items.(users)

active_blog_posts = active_items.(blog_posts)

active_sessions = active_items.(sessions)

active_orders = active_items.(orders)

As you can hopefully start to see, we can start to get some very small, focused, and powerful functions that are nicely abstracted to work against a broader range of input.

Next week, we will take the new versions of map, filter, and reduce, along with partial application, to show how we can take these smaller pieces of code and reuse and assemble them together to get more advanced behaviors.

–Proctor

Erlang Thursday – ETS data matching

Today’s Erlang Thursday moves on from the introduction to ETS, and starts using it to store some data and do some retrieval of data in ETS.

First we need some data to have in ETS, so we will fall back to one of my goto problems, Markov Chains.

For those unfamiliar with what a Markov Chain is, it is a state machine that transitions to the next state based off a probability instead of a specific input. The common example that people are familiar with in “everyday use” is predictive typing on smart phones, where the next word or letter is offered up as a prediction, and the predicted words are chosen by the historical likelihood that the words predicted follows the current word.

The first thing we will do is to create a module that given a string of text, will return a list of tuples representing a word and the word that follows.

-module(markov_words).

-export([create_word_pairs/1]).

-spec create_word_pairs(string()) -> list({string(), string()}).
create_word_pairs(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_pairs([], Words).

create_word_pairs(WordPairs, [_Word|[]]) ->
    WordPairs;
create_word_pairs(WordPairs, [Word|Words]) ->
    [Following|_] = Words,
    UpdatedWordPairs = [{Word, Following} | WordPairs],
    create_word_pairs(UpdatedWordPairs, Words).

The above code takes a string of text and splits that text into “words” based off using the space, tab, and newline characters as a word boundary. With that list of “words”, we then create a list of word to following word tuples, which is what we will be inserting into our ETS table.

Time to fire up the Erlang shell and start experimenting.

First we need to compile our module, and then we will create a variable to hold our text we want to use to prime our Markov Chain.

c(markov_words).
% {ok,markov_words}
ToTC = "It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of Light, it was the season of Darkness,
it was the spring of hope, it was the winter of despair,
we had everything before us, we had nothing before us,
we were all going direct to Heaven,
we were all going direct the other way--in short,
the period was so far like the present period,
that some of its noisiest authorities insisted on its
being received, for good or for evil, in the superlative
degree of comparison only.

There were a king with a large jaw and a queen with a
plain face, on the throne of England; there were a king
with a large jaw and a queen with a fair face,
on the throne of France. In both countries it was
clearer than crystal to the lords of the State preserves
of loaves and fishes, that things in general were
settled for ever.".

We will create a new process to give our ETS table away to, just in case we bomb out the shell.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.60.0>

And we now create an ETS table that will store data for us to use as part of our Markov Chain generation.

WordPairs = ets:new(word_pairs, [public, duplicate_bag]).
% 20498
ets:give_away(WordPairs, SomeProcess, []).
% true

We make the table public in this case, since we want our shell process, which is no longer the owner, to be able to add items to the table, and we make the table type a duplicate bag.

The reason for the duplicate_bag, is that for demonstration reasons, we want to be able to have multiple entries with the same key, as we are likely to see any word multiple times, and that some sets of word pairs are more common, so we want to be able to capture (and retain) those “duplicates”.

And for ease of population from inside the shell, we will use a list comprehension to add each word pair tuple we create from the text into our ETS table by calling ets:insert/2.

[[ ets:insert(WordPairs, WordPair) || WordPair <- markov_words:create_word_pairs(ToTC)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true|...]]

Now we should have some data in our ETS table, it is time to see how we can access our data. For this we turn to ets:match/2. ets:match/2 takes a table to query, and a Pattern.

The pattern is made up an Erlang term to be matched against; _, which matches anything and doesn’t bind; or pattern variables, which take the form of $N where N is any positive integer. The return result of ets:match/2 is a list containing the list of values in the pattern variables in order of variable name order.

So with this knowledge, we can try to find the word pairs to find the words that follow "of". If we were doing a pattern match it would look like {"of", Following}, but using ETS, we need to use a pattern variable which would make our spec {"of", '$1'}.

So lets run that against our ETS table.

ets:match(WordPairs, {"of", '$1'}).
% [["loaves"],
%  ["the"],
%  ["France."],
%  ["England;"],
%  ["comparison"],
%  ["its"],
%  ["despair,"],
%  ["hope,"],
%  ["Darkness,"],
%  ["Light,"],
%  ["incredulity,"],
%  ["belief,"],
%  ["foolishness,"],
%  ["wisdom,"],
%  ["times,"],
%  ["times,"]]

And there we go, we can see the results is a list of the list of variable matches, in this case, just what '$1' matched to.

For fun and exploration, let’s confirm what we would get if we look for the words that follow "it" in our Tale of Two Cities intro text.

ets:match(WordPairs, {"it", '$1'}).
% [["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"]]

Just a bunch of "was" which is right for the first two paragraphs of the book.

And we will double check and look for any words that follow "Scrooge".

ets:match(WordPairs, {"Scrooge", '$1'}).
% []

And if we wanted to get the whole tuple back, we could use ets:match_object/2, which will return the whole object that satisfies the match

ets:match_object(WordPairs, {"of", '$1'}).
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"}]

or, in this case we could use ets:lookup/2 which returns all of the items for which the key matches.

ets:lookup(WordPairs, "of").
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"}]

So to take a brief detour away from the Markov Chain example, why might we want to use either ets:lookup/2 or ets:match_object/2 versus the other? To answer that with an example, let’s add another entry into our WordPairs table, that is a three-tuple.

To start with, we will insert 100_000 items into our ETS tables and see what the resulting memory size becomes. We will insert a new tuple of {X, X}, for all numbers from 1 to 100_000.

ets:insert(WordPairs, {"of", "times,", "it"}).
% true

If we do a ets:lookup/2 we get all items with the specified key.

ets:lookup(WordPairs, "of").
[{"of","loaves"},
 {"of","the"},
 {"of","France."},
 {"of","England;"},
 {"of","comparison"},
 {"of","its"},
 {"of","despair,"},
 {"of","hope,"},
 {"of","Darkness,"},
 {"of","Light,"},
 {"of","incredulity,"},
 {"of","belief,"},
 {"of","foolishness,"},
 {"of","wisdom,"},
 {"of","times,"},
 {"of","times,"},
 {"of","times,","it"}]

But if we use ets:match_object/2, and use a two-tuple because we only want the word pairs, we don’t get the item that is a three-tuple in the results.

ets:match_object(WordPairs, {"of", '_'}).
[{"of","loaves"},
 {"of","the"},
 {"of","France."},
 {"of","England;"},
 {"of","comparison"},
 {"of","its"},
 {"of","despair,"},
 {"of","hope,"},
 {"of","Darkness,"},
 {"of","Light,"},
 {"of","incredulity,"},
 {"of","belief,"},
 {"of","foolishness,"},
 {"of","wisdom,"},
 {"of","times,"},
 {"of","times,"}]

Back to the Markov Chain scenario, we can start to see how we can get some text that follows the Markov Chain rules.

We get the match of potential words to choose from for a given word, and we pick a uniformly random result from the list of following words.

PotentialChoices = ets:match(WordPairs, {"of", '$1'}).
[NextWord] = lists:nth(random:uniform(length(PotentialChoices)), PotentialChoices).

We can write a function that will repeat these steps until we get to our termination case. Some examples of a termination state could be a word that doesn’t have a word that follows it; we get to a certain number of words picked to make up our text; or we get to a certain total length, say to fit in a SMS or Tweet.

With that, we have started to scratch the surface of putting some “real” data into ETS, and doing matching against the data for some given pattern. Next week we will continue looking at this example with other ways to get data out of our ETS tables into something that might be nicer to consume.

–Proctor