Ruby Tuesday – Partial Application of map, filter, and reduce

Now that we have covered how to get to a basic implementation of map, filter, and reduce in Ruby, as well as how to take advantage of Method#curry, we are going to see how we can get some extra power from our code by combining their use.

In the Ruby versions of Enumerables map, reduce, and select, it operates against a specific object, such as an array of users.

class User
  def initialize(name:, active:)
    @name = name
    @active = active
  end

  def active?
    @active
  end

  def name
    @name
  end
end

users = [User.new(name: "johnny b. goode", active: true),
         User.new(name: "jasmine", active: true),
         User.new(name: "peter piper", active: false),
         User.new(name: "mary", active: true),
         User.new(name: "elizabeth", active: true),
         User.new(name: "jennifer", active: false)]


users.map{|user| user.name}
# => ["johnny b. goode", "jasmine", "peter piper", "mary", "elizabeth", "jennifer"]

users.select{|user| user.active?}
# => [#<User:0x007fa37a13eb68 @active=true, @name="johnny b. goode">,
#  #<User:0x007fa37a13eaa0 @active=true, @name="jasmine">,
#  #<User:0x007fa37a13e910 @active=true, @name="mary">,
#  #<User:0x007fa37a13e848 @active=true, @name="elizabeth">]

If we want the names of a different collection, we need to call map (and use the same block) directly on that collection as well; like a if we had a collection of active Users.

users.select{|user| user.active?}.map{|user| user.name}
=> ["johnny b. goode", "jasmine", "mary", "elizabeth"]

This can be made a little more generic by having the methods get_user_names and get_active_users defined on User, but this still leaves us a bit shallow, so let’s see what else we can do base of what we have seen so far.

We will try it with our versions of map, filter, and reduce, and see how we can distill some of this logic, and raise the level of abstraction hight to make it more generic.

module Sequence
  def self.my_map(f, items)
    do_map = lambda do |accumulator, item|
      accumulator.dup << f.call(item)
    end

    my_reduce([], do_map, items)
  end

  def self.my_filter(predicate, items)
    do_filter = lambda do |accumulator, item|
      if (predicate.call(item))
        accumulator.dup << item
      else
        accumulator
      end
    end

    my_reduce([], do_filter, items)
  end

  def self.my_reduce(initial, operation, items)
    return nil unless items.any?{|item| true}

    accumulator = initial
    for item in items do
      accumulator = operation.call(accumulator, item)
    end
    accumulator
  end
end

And we look at how we call it using our Sequence module defined above.

Sequence.my_map(->(user) {user.name}, users)
# => ["johnny b. goode", "jasmine", "peter piper", "mary", "elizabeth", "jennifer"]
Sequence.my_filter(->(user) {user.active?}, users)
# => [#<User:0x007fa37a13eb68 @active=true, @name="johnny b. goode">,
#  #<User:0x007fa37a13eaa0 @active=true, @name="jasmine">,
#  #<User:0x007fa37a13e910 @active=true, @name="mary">,
#  #<User:0x007fa37a13e848 @active=true, @name="elizabeth">]

Granted, at this point, this is not a great improvement, if any, on it’s own.

BUT….

Since we take the collection to operate on as the last argument to our methods, we can combine our versions of my_map, my_filter, my_reduce with partial application to get a Proc that will do a specific operation against any Enumerable.

Let’s see how this would work.

First, we will update our Sequence module to have a map, filter, and reduce that can be partially applied.

module Sequence

  def self.my_map(f, items)
    do_map = lambda do |accumulator, item|
      accumulator.dup << f.call(item)
    end

    my_reduce([], do_map, items)
  end

  def self.my_filter(predicate, items)
    do_filter = lambda do |accumulator, item|
      if (predicate.call(item))
        accumulator.dup << item
      else
        accumulator
      end
    end

    my_reduce([], do_filter, items)
  end

  def self.my_reduce(initial, operation, items)
    return nil unless items.any?{|item| true}

    accumulator = initial
    for item in items do
      accumulator = operation.call(accumulator, item)
    end
    accumulator
  end

  @@map = method(:my_map).curry
  @@filter = method(:my_filter).curry
  @@reduce = method(:my_reduce).curry

  def self.map
    @@map
  end

  def self.filter
    @@filter
  end

  def self.reduce
    @@reduce
  end
end

Next with the ability to partially applied versions of our map, filter, and reduce, we can now save these off to variables that we can invoke later and just pass in the Users enumerable we wish to operate against.

names = Sequence.map.(->(user) {user.name})
# => #<Proc:0x007f9c53155990 (lambda)>
names.(users)
# => ["johnny b. goode", "jasmine", "peter piper", "mary", "elizabeth", "jennifer"]
get_active = Sequence.filter.(->(user) {user.active?})
# => #<Proc:0x007f9c531af3f0 (lambda)>
get_active.(users)
# => [#<User:0x007f9c5224e960 @active=true, @name="johnny b. goode">,
#  #<User:0x007f9c5224e8c0 @active=true, @name="jasmine">,
#  #<User:0x007f9c5224e780 @active=true, @name="mary">,
#  #<User:0x007f9c5224e6e0 @active=true, @name="elizabeth">]

Or if we want to, we can use the Symbol#to_proc so we don’t have to define our lambda for checking if an item is active?.

get_active = Sequence.filter.(:active?.to_proc)
=> #<Proc:0x007f9c531e62d8 (lambda)>
get_active.(users)
=> [#<User:0x007f9c5224e960 @active=true, @name="johnny b. goode">,
 #<User:0x007f9c5224e8c0 @active=true, @name="jasmine">,
 #<User:0x007f9c5224e780 @active=true, @name="mary">,
 #<User:0x007f9c5224e6e0 @active=true, @name="elizabeth">]

And now that we have our partial applied functions, we can also chain our calls together to get the names of active Users.

names.(get_active.(users))
=> ["johnny b. goode", "jasmine", "mary", "elizabeth"]

Not only that, but say we have some collections of Product objects,

class Product
  attr_reader :id, :name, :brand

  def initialize(id:, name:, active:, brand:)
    @id = id
    @name = name
    @active = active
    @brand = brand
  end

  def active?
    @active
  end
end

products = [Product.new(id: 0, name: "Prefect", active: false, brand: "Ford"),
            Product.new(id: 7, name: "SICP", active: true, brand: "MIT Press"),
            Product.new(id: 16, name: "HTDP", active: true, brand: "MIT Press"),
            Product.new(id: 17, name: "MRI", active: true, brand: "Ruby"),
            Product.new(id: 42, name: "HHGTTG", active: true, brand: "HHGTTG"),
            Product.new(id: 53, name: "Windows 3.1", active: false, brand: "Microsoft")]

Session objects,

class Session
  def initialize(name:, duration:)
    @name = name
    @duration = duration
  end

  def name
    @name
  end

  def active?
    @duration < 15
  end
end

sessions = [Session.new(name: "session A", duration: 3),
            Session.new(name: "session A", duration: 30),
            Session.new(name: "session A", duration: 17),
            Session.new(name: "session A", duration: 9),
            Session.new(name: "session A", duration: 1)]

and even SalesLead objects.

class SalesLead
  def initialize(lead:, active:)
    @lead = lead
    @active = active
  end

  def active?
    @active && @lead.active?
  end

  def name
    @lead.name
  end
end

leads = [SalesLead.new(lead: User.new(name: "lead 1", active: true), active: true),
         SalesLead.new(lead: User.new(name: "lead 2", active: true), active: false),
         SalesLead.new(lead: User.new(name: "lead 3", active: false), active: true),
         SalesLead.new(lead: User.new(name: "lead 4", active: true), active: true),
         SalesLead.new(lead: User.new(name: "lead 5", active: true), active: true),
         SalesLead.new(lead: User.new(name: "lead 6", active: false), active: false)]

And because our Product class has the methods name and active?, we can use our names and get_active variables that hold our partially applied Procs against the list of Products,

names.(products)
# => ["Prefect", "SICP", "HTDP", "MRI", "HHGTTG", "Windows 3.1"]
get_active.(products)
# => [#<Product:0x007f9c530b7dd0 @active=true, @brand="MIT Press", @id=7, @name="SICP">,
#  #<Product:0x007f9c530b7ce0 @active=true, @brand="MIT Press", @id=16, @name="HTDP">,
#  #<Product:0x007f9c530b7bf0 @active=true, @brand="Ruby", @id=17, @name="MRI">,
#  #<Product:0x007f9c530b7ad8 @active=true, @brand="HHGTTG", @id=42, @name="HHGTTG">]
names.(get_active.(products))
# => ["SICP", "HTDP", "MRI", "HHGTTG"]

Sessions,

names.(sessions)
# => ["session A", "session A", "session A", "session A", "session A"]
get_active.(sessions)
# => [#<Session:0x007f9c52153560 @duration=3, @name="session A">,
#  #<Session:0x007f9c52152f98 @duration=9, @name="session A">,
#  #<Session:0x007f9c52152ca0 @duration=1, @name="session A">]
names.(get_active.(sessions))
# => ["session A", "session A", "session A"]

and SalesLeads,

names.(leads)
# => ["lead 1", "lead 2", "lead 3", "lead 4", "lead 5", "lead 6"]
get_active.(leads)
# => [#<SalesLead:0x007f9c5212a020
#   @active=true,
#   @lead=#<User:0x007f9c5212a160 @active=true, @name="lead 1">>,
#  #<SalesLead:0x007f9c521292d8
#   @active=true,
#   @lead=#<User:0x007f9c52129468 @active=true, @name="lead 4">>,
#  #<SalesLead:0x007f9c52128e00
#   @active=true,
#   @lead=#<User:0x007f9c52129080 @active=true, @name="lead 5">>]
names.(get_active.(leads))
# => ["lead 1", "lead 4", "lead 5"]

With this in mind, we will update the definition of our names and get_active to show that it is not just “users” it operates against, but any item.

names = Sequence.map.(->(x) {x.name})
# => #<Proc:0x007f9c53107f38 (lambda)>
get_active = Sequence.filter.(->(x) {x.active?})
# => #<Proc:0x007f9c530bfcd8 (lambda)>

So with this, we have now been able to take our map and select from Ruby’s Enumerable class that worked on a specific collection only, without being redefined and moved into a method to live on User somewhere, we now have a Proc that is recognized to be applicable to anything that accepts an item of that form, and these can be defined and used anywhere.

–Proctor

Erlang Thursday – More ETS data matching (and querying)

In today’s Erlang Thursday we continue from last week in looking at getting data from ETS.

To refresh, we have a module markov_words, and for this week we have added a new function markov_words:create_word_triples/1.

-module(markov_words).

-export([create_word_pairs/1,
         create_word_triples/1]).

-spec create_word_pairs(string()) -> list({string(), string()}).
create_word_pairs(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_pairs([], Words).

-spec create_word_triples(string()) -> list({string(), string(), string()}).
create_word_triples(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_triples(Words, []).


create_word_pairs(WordPairs, [_Word|[]]) ->
    WordPairs;
create_word_pairs(WordPairs, [Word|Words]) ->
    [Following|_] = Words,
    UpdatedWordPairs = [{Word, Following} | WordPairs],
    create_word_pairs(UpdatedWordPairs, Words).


create_word_triples([_Word, _SecondWord | []], WordTriples) ->
    WordTriples;
create_word_triples([FirstWord | Words], WordTriples) ->
    [SecondWord, Following | _] = Words,
    UpdatedWordTriples = [{FirstWord, SecondWord, Following} | WordTriples],
    create_word_triples(Words, UpdatedWordTriples).

The excuse for having this new function is that it would allow us to get more refined Markov chains by picking the probability of the next word by having the state be the compound key of the last two words seen.

With our module updated and defined, we get back to our Erlang shell to test things out, by compiling our module and loading up our intro text into a variable.

c(markov_words).
% {ok,markov_words}

ToTC = "It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of Light, it was the season of Darkness,
it was the spring of hope, it was the winter of despair,
we had everything before us, we had nothing before us,
we were all going direct to Heaven,
we were all going direct the other way--in short,
the period was so far like the present period,
that some of its noisiest authorities insisted on its
being received, for good or for evil, in the superlative
degree of comparison only.

There were a king with a large jaw and a queen with a
plain face, on the throne of England; there were a king
with a large jaw and a queen with a fair face,
on the throne of France. In both countries it was
clearer than crystal to the lords of the State preserves
of loaves and fishes, that things in general were
settled for ever.".

We create our fresh ETS table for this week, create a new process to own it, and give it away (in case we type something wrong and cause the current session of the shell to crash).

MarkovWords = ets:new(markov_word_tuples, [public, duplicate_bag]).
% 16402
Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.58.0>
ets:give_away(MarkovWords, SomeProcess, []).
% true

This week, in addition to adding our word pair tuples to ETS, we will also add in our new word triple tuples to ETS in the same table.

[[ ets:insert(MarkovWords, WordPair) || WordPair <- markov_words:create_word_pairs(ToTC)]].

[[ ets:insert(MarkovWords, WordTriple) || WordTriple <- markov_words:create_word_triples(ToTC)]].

Since we have both word pairs and word triples in the same ETS table, we can see that with ets:match_object/2, we can specify a match_pattern() for only the two tuples

ets:match_object(MarkovWords, {"of", '$1'}).
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"}]

or a match_pattern() that will only match the three tuples.

ets:match_object(MarkovWords, {"of", '$1', '$2'}).
% [{"of","loaves","and"},
%  {"of","the","State"},
%  {"of","France.","In"},
%  {"of","England;","there"},
%  {"of","comparison","only."},
%  {"of","its","noisiest"},
%  {"of","despair,","we"},
%  {"of","hope,","it"},
%  {"of","Darkness,","it"},
%  {"of","Light,","it"},
%  {"of","incredulity,","it"},
%  {"of","belief,","it"},
%  {"of","foolishness,","it"},
%  {"of","wisdom,","it"},
%  {"of","times,","it"},
%  {"of","times,","it"}]

Where as if we use the ets:lookup/2 with the key, we get all items with the key, regardless of the tuple size.

ets:lookup(MarkovWords, "of").
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"},
%  {"of","loaves","and"},
%  {"of","the","State"},
%  {"of","France.","In"},
%  {"of","England;","there"},
%  {"of","comparison","only."},
%  {"of","its","noisiest"},
%  {"of","despair,","we"},
%  {"of","hope,","it"},
%  {"of","Darkness,","it"},
%  {"of","Light,",[...]},
%  {"of",[...],...},
%  {[...],...},
%  {...}|...]

And unlike ets:lookup/2, with ets:match_object/2 we can match on any tuple element, and not just the key.

ets:match_object(MarkovWords, {'$1', "the", '$2'}).
% [{"on","the","throne"},
%  {"on","the","throne"},
%  {"direct","the","other"},
%  {"short,","the","period"},
%  {"like","the","present"},
%  {"of","the","State"},
%  {"to","the","lords"},
%  {"in","the","superlative"},
%  {"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the","season"},
%  {"was","the","season"},
%  {"was","the","epoch"},
%  {"was","the","epoch"},
%  {"was","the","age"},
%  {"was","the","age"},
%  {"was","the","worst"},
%  {"was","the","best"}]

And like ets:match_object/2, ets:match/2 can match based off the tuple itself as well.

ets:match(MarkovWords, {"was", "the", '$1'}).
% [["winter"],
%  ["spring"],
%  ["season"],
%  ["season"],
%  ["epoch"],
%  ["epoch"],
%  ["age"],
%  ["age"],
%  ["worst"],
%  ["best"]]

But sometimes we might want finer grain control over how the results are given back to us, such as a single list of items instead of a nested list of strings. Or maybe we even have some criteria that we want to hold true as part of our selections on the data.

Enter ets:select/2.

ets:select/2 takes the table as its first argument, and a match_spec() as its second argument.

The match_spec() is a list of three-tuples, where the first element is the match pattern, second element is a list of guard clause tuples, and the last element is the result is a term representation of the result for each match.

If we want to call ets:select/2 and have it align with ets:match/2 our call looks like the following.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], [['$1']]}]).
% [["winter"],
%  ["spring"],
%  ["season"],
%  ["season"],
%  ["epoch"],
%  ["epoch"],
%  ["age"],
%  ["age"],
%  ["worst"],
%  ["best"]]

The second argument is a list of match_spec()s, of which there is only one which consists of:
1). a match_pattern() of {"was", "the", '$1'}, which is the same thing we gave to ets:match/2
2). [], and empty list of guard condition tuples, and
3). [[‘$1’]] for the result term, which is the list of terms we want the result formatted as, in this case we want each result to be in its own list.

If we just wanted to get the word themselves as a list, we can update the result term part of the match_spec() to be ['$1'] instead.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], ['$1']}]).
% ["winter","spring","season","season","epoch","epoch","age",
%  "age","worst","best"]

If we wanted something that looked more like a ets:match_object/2 result set we can use the result term of '$_', which signifies the whole object.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], ['$_']}]).
% [{"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the","season"},
%  {"was","the","season"},
%  {"was","the","epoch"},
%  {"was","the","epoch"},
%  {"was","the","age"},
%  {"was","the","age"},
%  {"was","the","worst"},
%  {"was","the","best"}]

And if we wanted to only match on one of the items, and capture the other items in the tuple, we can use the result of '$$' which returns all of the match variable in a list, ordered by variable number as opposed to position in the match_pattern().

ets:select(MarkovWords, [{{"was", '$1', '$2'}, [], ['$$']}]).
% [["clearer","than"],
%  ["so","far"],
%  ["the","winter"],
%  ["the","spring"],
%  ["the","season"],
%  ["the","season"],
%  ["the","epoch"],
%  ["the","epoch"],
%  ["the","age"],
%  ["the","age"],
%  ["the","worst"],
%  ["the","best"]]

ets:select(MarkovWords, [{{"was", '$2', '$1'}, [], ['$$']}]).
% [["than","clearer"],
%  ["far","so"],
%  ["winter","the"],
%  ["spring","the"],
%  ["season","the"],
%  ["season","the"],
%  ["epoch","the"],
%  ["epoch","the"],
%  ["age","the"],
%  ["age","the"],
%  ["worst","the"],
%  ["best","the"]]

With ets:select/2 we also get the ability to specify multiple match_spec()s. This allows us to find all word triple word triples that have either "of" or "the" as the middle word.

ets:select(MarkovWords, [{{'$1', "the", '$2'}, [], ['$_']}, {{'$1', "of", '$2'}, [], ['$_']}]).
% [{"some","of","its"},
%  {"on","the","throne"},
%  {"on","the","throne"},
%  {"direct","the","other"},
%  {"preserves","of","loaves"},
%  {"throne","of","France."},
%  {"throne","of","England;"},
%  {"worst","of","times,"},
%  {"short,","the","period"},
%  {"winter","of","despair,"},
%  {"degree","of","comparison"},
%  {"epoch","of","incredulity,"},
%  {"epoch","of","belief,"},
%  {"spring","of","hope,"},
%  {"like","the","present"},
%  {"of","the","State"},
%  {"age","of","foolishness,"},
%  {"age","of","wisdom,"},
%  {"best","of","times,"},
%  {"season","of","Darkness,"},
%  {"season","of","Light,"},
%  {"to","the","lords"},
%  {"in","the","superlative"},
%  {"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the",[...]},
%  {"was",[...],...},
%  {[...],...},
%  {...}|...]

And with guard clauses, we can find third item in the three-tuples that start with "was", that comes later in the dictionary than the word in the second position of the tuple.

ets:select(MarkovWords, [{{"was", '$1', '$2'}, [{'<', '$1', '$2'}], ['$2']}]).
% ["than","winter","worst"]

So with this week’s post we have seen other ways of using ets:match/2 and ets:match_object/2, and what they can get over using just a ets:lookup/2 for a key, as well as being able to take advantage of even more powerful querying by using ets:select/2.

Next week, we will look at more ways to use ets:select/2, and how we can use some other ets module functions to help create queries that can be easier to deconstruct at a quicker glance.

–Proctor

Ruby Tuesday – Partial Application

As we continue with the theme we have been pursuing in the last couple of posts, we take a brief pit stop and look at partial application before we move on to our next method we want to define.

Partial application is the ability to provide only a subset of arguments to a function or method, and return a new function/method to be called later that will keep the original context.

We will start our examples with the two methods double and triple.

def double(y)
  2 * y
end

def triple(y)
  3 * y
end

double(3)
# => 6

triple(5)
# => 15

We can think of these methods as defined in terms of a more generic function multiply that we call with a hard coded value.

def multiply(x, y)
  x * y
end

def double(y)
  multiply(2, y)
end

def triple(y)
  multiply(3, y)
end

What partial application allows us to do is to define double and triple in terms of multiply, by calling it with the first argument only, the 2 for the method double, and saving the resulting function to be invoked later.

To do this in Ruby we can use Method#curry, or Proc#curry.

Method#curry will return a new Proc that can then be invoked with only a subset of its arguments.

method(:multiply).curry
# => #<Proc:0x007fc02b225950 (lambda)>

So for our double and triple functionality, we can make those be variables which hold the resulting Proc of passing in their value to multiply, and invoke them by only passing in the value we want to double, or triple.

double = method(:multiply).curry.(2)
# => #<Proc:0x007fc02b1eeea0 (lambda)>
double.(8)
# => 16

triple = method(:multiply).curry.(3)
# => #<Proc:0x007fc02b186260 (lambda)>
triple.(17)
# => 51

At this point you might be wondering what this gets you, as the examples of double, triple, and multiply might seem a bit simplistic at best, and maybe even contrived.

I would agree; it is a simple example, but mainly to show as an example of what partial application is, and now we will take a look at our filter, map, and reduce from the previous posts and update them to show some of the power of partial application.

As a reminder this is the map, filter, and reduce as defined previously.

def map(items, do_map)
  reduce([], items, lambda do |accumulator, item|
    accumulator.dup << do_map.call(item)
  end)
end

def filter(items, predicate)
  reduce([], items, lambda do |accumulator, item|
    if (predicate.call(item))
      accumulator.dup << item
    else
      accumulator
    end
  end)
end

def reduce(initial, items, operation)
  return nil if items.empty?

  accumulator = initial
  for item in items do
    accumulator = operation.call(accumulator, item)
  end
  accumulator
end

We will update these definitions to be able to take the items collection last, as that is the most general argument.

def map(f, items)
  do_map = lambda do |accumulator, item|
    accumulator.dup << f.call(item)
  end

  reduce([], do_map, items)
end

def filter(predicate, items)
  do_filter = lambda do |accumulator, item|
    if (predicate.call(item))
      accumulator.dup << item
    else
      accumulator
    end
  end

  reduce([], do_filter, items)
end

def reduce(initial, operation, items)
  return nil unless items.any?{|item| true}

  accumulator = initial
  for item in items do
    accumulator = operation.call(accumulator, item)
  end
  accumulator
end

You might be wondering why I said that the collections of items is the most general argument.

That was because, if we want to sum a sequence of numbers, double a sequence of numbers, or even pick out the even numbers, we can do that against a number of different collections of items.

reduce(0, :+.to_proc, (1..5))
# => 15
reduce(0, :+.to_proc, [2, 4, 6, 8])
# => 20

map(double, (2..7))
# => [4, 6, 8, 10, 12, 14]
map(double, [1, 2, 3, 5, 8, 13])
# => [2, 4, 6, 10, 16, 26]

filter(lambda{|x| x.even?}, (5..10))
# => [6, 8, 10]
filter(lambda{|x| x.even?}, (0..100).step(10))
# => [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

And because we now have the generic items last, we can use Method#curry to build up Procs to call that represent the parts that are the same and give those Procs descriptive names.

concat = method(:reduce).curry.("", :+.to_proc)
# => #<Proc:0x007fc02b207f68 (lambda)>

concat.(("a".."d"))
# => "abcd"
concat.(["alpha", "beta", "gamma", "delta"])
# => "alphabetagammadelta"


sum = method(:reduce).curry.(0, :+.to_proc)
# => #<Proc:0x007fc02b25c450 (lambda)>

sum.((1..10))
# => 55
sum.((2..20).step(2))
# => 110


evens_only = method(:filter).curry.(lambda{|x| x.even?})
# => #<Proc:0x007fc02b155688 (lambda)>

evens_only.([1, 2, 3, 5, 8, 13, 21])
# => [2, 8]
evens_only.([1, 2, 4, 7, 11])
# => [2, 4]


doubles = method(:map).curry.(double)
# => #<Proc:0x007fc02b0c0308 (lambda)>

doubles.((1..10))
# => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
doubles.([2, 4, 6, 8])
# => [4, 8, 12, 16]

And in the last example of doubles we did a curry of map, and used the partially applied function double as the argument to be partially applied to map.

So with partial application, we can start to abstract common behavior out into Procs that can operate against more generic data.

One last example would be filtering items against those that are active. With partial application, we can take filter and create a partially applied version that is given a Proc that will check to see if an item is active?.

active_items = method(:filter).curry.(lambda{|item| item.active?})
# => #<Proc:0x007fc02b207f68 (lambda)>

With this active_items Proc, we can then use that against any collection of objects as long as they support the method active?, e.g. Users, Orders, Sessions, Blog Posts, etc.

active_users = active_items.(users)

active_blog_posts = active_items.(blog_posts)

active_sessions = active_items.(sessions)

active_orders = active_items.(orders)

As you can hopefully start to see, we can start to get some very small, focused, and powerful functions that are nicely abstracted to work against a broader range of input.

Next week, we will take the new versions of map, filter, and reduce, along with partial application, to show how we can take these smaller pieces of code and reuse and assemble them together to get more advanced behaviors.

–Proctor

Erlang Thursday – ETS data matching

Today’s Erlang Thursday moves on from the introduction to ETS, and starts using it to store some data and do some retrieval of data in ETS.

First we need some data to have in ETS, so we will fall back to one of my goto problems, Markov Chains.

For those unfamiliar with what a Markov Chain is, it is a state machine that transitions to the next state based off a probability instead of a specific input. The common example that people are familiar with in “everyday use” is predictive typing on smart phones, where the next word or letter is offered up as a prediction, and the predicted words are chosen by the historical likelihood that the words predicted follows the current word.

The first thing we will do is to create a module that given a string of text, will return a list of tuples representing a word and the word that follows.

-module(markov_words).

-export([create_word_pairs/1]).

-spec create_word_pairs(string()) -> list({string(), string()}).
create_word_pairs(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_pairs([], Words).

create_word_pairs(WordPairs, [_Word|[]]) ->
    WordPairs;
create_word_pairs(WordPairs, [Word|Words]) ->
    [Following|_] = Words,
    UpdatedWordPairs = [{Word, Following} | WordPairs],
    create_word_pairs(UpdatedWordPairs, Words).

The above code takes a string of text and splits that text into “words” based off using the space, tab, and newline characters as a word boundary. With that list of “words”, we then create a list of word to following word tuples, which is what we will be inserting into our ETS table.

Time to fire up the Erlang shell and start experimenting.

First we need to compile our module, and then we will create a variable to hold our text we want to use to prime our Markov Chain.

c(markov_words).
% {ok,markov_words}
ToTC = "It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of Light, it was the season of Darkness,
it was the spring of hope, it was the winter of despair,
we had everything before us, we had nothing before us,
we were all going direct to Heaven,
we were all going direct the other way--in short,
the period was so far like the present period,
that some of its noisiest authorities insisted on its
being received, for good or for evil, in the superlative
degree of comparison only.

There were a king with a large jaw and a queen with a
plain face, on the throne of England; there were a king
with a large jaw and a queen with a fair face,
on the throne of France. In both countries it was
clearer than crystal to the lords of the State preserves
of loaves and fishes, that things in general were
settled for ever.".

We will create a new process to give our ETS table away to, just in case we bomb out the shell.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.60.0>

And we now create an ETS table that will store data for us to use as part of our Markov Chain generation.

WordPairs = ets:new(word_pairs, [public, duplicate_bag]).
% 20498
ets:give_away(WordPairs, SomeProcess, []).
% true

We make the table public in this case, since we want our shell process, which is no longer the owner, to be able to add items to the table, and we make the table type a duplicate bag.

The reason for the duplicate_bag, is that for demonstration reasons, we want to be able to have multiple entries with the same key, as we are likely to see any word multiple times, and that some sets of word pairs are more common, so we want to be able to capture (and retain) those “duplicates”.

And for ease of population from inside the shell, we will use a list comprehension to add each word pair tuple we create from the text into our ETS table by calling ets:insert/2.

[[ ets:insert(WordPairs, WordPair) || WordPair <- markov_words:create_word_pairs(ToTC)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true|...]]

Now we should have some data in our ETS table, it is time to see how we can access our data. For this we turn to ets:match/2. ets:match/2 takes a table to query, and a Pattern.

The pattern is made up an Erlang term to be matched against; _, which matches anything and doesn’t bind; or pattern variables, which take the form of $N where N is any positive integer. The return result of ets:match/2 is a list containing the list of values in the pattern variables in order of variable name order.

So with this knowledge, we can try to find the word pairs to find the words that follow "of". If we were doing a pattern match it would look like {"of", Following}, but using ETS, we need to use a pattern variable which would make our spec {"of", '$1'}.

So lets run that against our ETS table.

ets:match(WordPairs, {"of", '$1'}).
% [["loaves"],
%  ["the"],
%  ["France."],
%  ["England;"],
%  ["comparison"],
%  ["its"],
%  ["despair,"],
%  ["hope,"],
%  ["Darkness,"],
%  ["Light,"],
%  ["incredulity,"],
%  ["belief,"],
%  ["foolishness,"],
%  ["wisdom,"],
%  ["times,"],
%  ["times,"]]

And there we go, we can see the results is a list of the list of variable matches, in this case, just what '$1' matched to.

For fun and exploration, let’s confirm what we would get if we look for the words that follow "it" in our Tale of Two Cities intro text.

ets:match(WordPairs, {"it", '$1'}).
% [["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"]]

Just a bunch of "was" which is right for the first two paragraphs of the book.

And we will double check and look for any words that follow "Scrooge".

ets:match(WordPairs, {"Scrooge", '$1'}).
% []

And if we wanted to get the whole tuple back, we could use ets:match_object/2, which will return the whole object that satisfies the match

ets:match_object(WordPairs, {"of", '$1'}).
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"}]

or, in this case we could use ets:lookup/2 which returns all of the items for which the key matches.

ets:lookup(WordPairs, "of").
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"}]

So to take a brief detour away from the Markov Chain example, why might we want to use either ets:lookup/2 or ets:match_object/2 versus the other? To answer that with an example, let’s add another entry into our WordPairs table, that is a three-tuple.

To start with, we will insert 100_000 items into our ETS tables and see what the resulting memory size becomes. We will insert a new tuple of {X, X}, for all numbers from 1 to 100_000.

ets:insert(WordPairs, {"of", "times,", "it"}).
% true

If we do a ets:lookup/2 we get all items with the specified key.

ets:lookup(WordPairs, "of").
[{"of","loaves"},
 {"of","the"},
 {"of","France."},
 {"of","England;"},
 {"of","comparison"},
 {"of","its"},
 {"of","despair,"},
 {"of","hope,"},
 {"of","Darkness,"},
 {"of","Light,"},
 {"of","incredulity,"},
 {"of","belief,"},
 {"of","foolishness,"},
 {"of","wisdom,"},
 {"of","times,"},
 {"of","times,"},
 {"of","times,","it"}]

But if we use ets:match_object/2, and use a two-tuple because we only want the word pairs, we don’t get the item that is a three-tuple in the results.

ets:match_object(WordPairs, {"of", '_'}).
[{"of","loaves"},
 {"of","the"},
 {"of","France."},
 {"of","England;"},
 {"of","comparison"},
 {"of","its"},
 {"of","despair,"},
 {"of","hope,"},
 {"of","Darkness,"},
 {"of","Light,"},
 {"of","incredulity,"},
 {"of","belief,"},
 {"of","foolishness,"},
 {"of","wisdom,"},
 {"of","times,"},
 {"of","times,"}]

Back to the Markov Chain scenario, we can start to see how we can get some text that follows the Markov Chain rules.

We get the match of potential words to choose from for a given word, and we pick a uniformly random result from the list of following words.

PotentialChoices = ets:match(WordPairs, {"of", '$1'}).
[NextWord] = lists:nth(random:uniform(length(PotentialChoices)), PotentialChoices).

We can write a function that will repeat these steps until we get to our termination case. Some examples of a termination state could be a word that doesn’t have a word that follows it; we get to a certain number of words picked to make up our text; or we get to a certain total length, say to fit in a SMS or Tweet.

With that, we have started to scratch the surface of putting some “real” data into ETS, and doing matching against the data for some given pattern. Next week we will continue looking at this example with other ways to get data out of our ETS tables into something that might be nicer to consume.

–Proctor

Ruby Tuesday – Refactoring Towards Creating reduce

As we continue with the theme we have been pursuing in the last couple of posts, we will look at refactoring to reduce, and will take a look at how we can use this with what we have built on from previous posts.

Again for reference, here is our setup data of a User object

require 'date'

class User
  attr_reader :name, :date_of_birth, :date_of_death, :languages_created

  def initialize(name:,
                 is_active:,
                 date_of_birth: nil,
                 date_of_death: nil,
                 languages_created: [])
    @name = name
    @is_active = is_active
    @date_of_birth = date_of_birth
    @date_of_death = date_of_death
    @languages_created = languages_created
  end

  def active?
    @is_active
  end

  def to_s
    inspect
  end
end

and our list of users.

alan_kay = User.new(name: "Alan Kay",
                    is_active: true,
                    date_of_birth: Date.new(1940, 5, 17),
                    languages_created: ["Smalltalk", "Squeak"])
john_mccarthy = User.new(name: "John McCarthy",
                         is_active: true,
                         date_of_birth: Date.new(1927, 9, 4),
                         date_of_death: Date.new(2011, 10, 24),
                         languages_created: ["Lisp"])
robert_virding = User.new(name: "Robert Virding",
                          is_active: true,
                          languages_created: ["Erlang", "LFE"])
dennis_ritchie = User.new(name: "Dennis Ritchie",
                          is_active: true,
                          date_of_birth: Date.new(1941, 9, 9),
                          date_of_death: Date.new(2011, 10, 12),
                          languages_created: ["C"])
james_gosling = User.new(name: "James Gosling",
                         is_active: true,
                         date_of_birth: Date.new(1955, 5, 19),
                         languages_created: ["Java"])
matz = User.new(name: "Yukihiro Matsumoto",
                is_active: true,
                date_of_birth: Date.new(1965, 4, 14),
                languages_created: ["Ruby"])
nobody = User.new(name: "",
                  is_active: false)

users = [alan_kay, john_mccarthy, robert_virding,
         dennis_ritchie, james_gosling, matz, nobody]

In our theoretical code base we have some code that will find the oldest language creator.

def oldest_language_creator(users)
  oldest = nil
  for user in users do
    next unless user.date_of_death.nil?
    next if user.date_of_birth.nil?
    if (oldest.nil? || oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

oldest_language_creator(users).name
# => "Alan Kay"

That is pretty nasty, so let’s see if we can clean it up some and see what happens.

First, inside the for we have both and if and unless, so let’s refactor the unless to be an if.

def oldest_language_creator(users)
  oldest = nil
  for user in users do
    next if (not user.date_of_death.nil?)
    next if user.date_of_birth.nil?
    if (oldest.nil? || oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

pry(main)> oldest_language_creator(users).name
# => "Alan Kay"

And while we are at it, we will refactor out the conditions in the ifs to give them clarifying names.

def alive?(user)
  user.date_of_death.nil?
end

def has_known_birthday?(user)
  not user.date_of_birth.nil?
end

def oldest_language_creator(users)
  oldest = nil
  for user in users do
    next if not alive?(user)
    next if not has_known_birthday?(user)
    if (oldest.nil? || oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

oldest_language_creator(users).name
# => "Alan Kay"

Still works, and that we now have multiple if in our for loop, we can think back to a couple of posts ago, and realize we have a couple of filters happening in our list, and then some logic around who has the earliest birth date.

So let’s refactor out the filters and see what our method starts to look like.

def oldest_language_creator(users)
  alive_users = filter(users, lambda{|user| alive?(user)})
  with_birthdays = filter(alive_users,
                          lambda{|user| has_known_birthday?(user)})

  oldest = nil
  for user in with_birthdays do
    if (oldest.nil? || oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

oldest_language_creator(users).name
#=> "Alan Kay"

This next refactoring might be a bit of a jump, but for me, I am not too fond with starting out with a nil and having to check that every time, since it will be fixed on the first time around, so let’s clean that up.

def oldest_language_creator(users)
  alive_users = filter(users, lambda{|user| alive?(user)})
  with_birthdays = filter(alive_users, 
                          lambda{|user| has_known_birthday?(user)})

  oldest, *rest = with_birthdays
  for user in rest do
    if (oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

Let’s refactor out our for loop into another method, so we can look at it on its own.

def user_with_earliest_birthday(users)
  oldest, *rest = users
  for user in rest do
    if (oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

def oldest_language_creator(users)
  alive_users = filter(users, lambda{|user| alive?(user)})
  with_birthdays = filter(alive_users,
                          lambda{|user| has_known_birthday?(user)})
  user_with_earliest_birthday(with_birthdays)
end

Now we have a pattern here, and it has been present in our filter and map as well, if you can see it, so let’s see if we can identify it.

def user_with_earliest_birthday(users)
  oldest, *rest = users
  for user in rest do
    if (oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

def filter(items, predicate)
  matching = []
  for item in items do
    if (predicate.call(item))
      matching << item
    end
  end
  matching
end

def map(items, do_map)
  results = []
  for item in items do
    results << do_map.call(item)
  end
  results
end

If you haven’t detangled it, the pattern is:
1. We have some initial value,
2. and for every item in a list we do some operation against that item value and the current accumulated value, which results in a new value
3. we return the accumulated value.

With our user_with_earliest_birthday method, the initial accumlated value is the first user, the operation is a compare against that with each item to find the oldest user so far, and we return the oldest user.

With filter, the initial accumulated value is an empty Array, the operation is an append if some critera is met, and the return is the accumlated array for those items that meet that criteria.

With map, the initial accumulated value is an empty Array, the operation is an append of the result of a transformation function on each value, and the return is the accumlated array for the transformed results.

This pattern is called reduce.

So what would this look like generically???

def reduce(initial, items, operation)
  accumulator = initial
  for item in items do
    accumulator = operation.call(accumulator, item)
  end
  accumulator
end

Let’s write our user_with_earliest_birthday using this new reduce then, and consume it in our oldest_language_creator

def oldest_language_creator(users)
  alive_users = filter(users, lambda{|user| alive?(user)})
  with_birthdays = filter(alive_users,
                          lambda{|user| has_known_birthday?(user)})

  reduce(with_birthdays.first, with_birthdays.drop(1),
         lambda do |oldest, user|
           oldest.date_of_birth > user.date_of_birth ? user : oldest
         end)
end

Our accumulator starts with the first user in the list, uses the rest of the list to iterate through, and then will return either the accumulator (oldest_so_far) or the current item (user), which would be assigned to the accumulator value for the next iteration.

So how would we write our map and filter to use this new reduce?

def map(items, do_map)
  reduce([], items,
         lambda do |accumulator, item|
           accumulator.dup << do_map.call(item)
         end)
end

def filter(items, predicate)
  reduce([], items, lambda do |accumulator, item|
    if (predicate.call(item))
      accumulator.dup << item
    else
      accumulator
    end
  end)
end

For our new map, our operation is to call the do_map lambda given to the function, and add the transformed value to a duplicate of the original accumulator. While in these cases, it is not necessary to duplicate the original accumulator, I did so here to mirror that in reduce, we are getting what could be considered a completely new value, such as we have with our oldest_language_creator version that uses reduce.

And for our new filter our operation either returns the original accumulator, or adds the item to a new copy of the accumulated list if the predicate passed to filter returns truth. Again, we could leave out the duplication, but for purity sake, and working out the logic we will keep it in there.

So let’s step through our new filter and see what happens one step at a time with it now using reduce.

filter((1..9), lambda{|item| item.odd?})

If we inline reduce substituting the variables given to filter, it looks like the following.

reduce([], (1..9), lambda do |accumulator, item|
  if (lambda{|item| item.odd?}.call(item))
    accumulator.dup << item
  else
    accumulator
  end
end)

And if we expand the body of reduce, and rename it to filter_odds, we get

def filter_odds()
  accumulator = []
  for item in (1..9) do
    accumulator = lambda do |accumulator, item|
      if (lambda{|item| item.odd?}.call(item))
        accumulator.dup << item
      else
        accumulator
      end
    end.call(accumulator, item)
  end
  accumulator
end

And we inline the calls to the lambda that came is from the predicate

def filter_odds()
  accumulator = []
  for item in (1..9) do
    accumulator = lambda do |accumulator, item|
      if (item.odd?)
        accumulator.dup << item
      else
        accumulator
      end
    end.call(accumulator, item)
  end
  accumulator
end

and inline the lambda for the operation to given to reduce

def filter_odds()
  accumulator = []
  for item in (1..9) do
    accumulator = if (item.odd?)
                    accumulator.dup << item
                  else
                    accumulator
                  end
  end
  accumulator
end

And we can see how through filter and reduce, we get back to something that looks like the orignal filtering out of odd numbers from a list.

And to test out reduce further, let’s add some numbers together.

We will call reduce with our initial accumulated “sum” of 0, the numbers from 1 to 10, and a lambda that adds the two numbers together to produce a new running sum.

reduce(0, (1..10), lambda{|accum, item| accum + item})
# => 55

And we do the same for a reduce that computes the product of a list of numbers.

This time our initial accumulator value is 1 which is the identity operation of multiplication.

reduce(1, (1..10), lambda{|accum, item| accum * item})
# => 3628800

But if we call it with an empty list, we return 1 still

reduce(1, [], lambda{|accum, item| accum * item})
# => 1

So we need to clean up our reduce some to make it more robust in the case of reducing against empty lists.

def reduce(initial, items, operation)
  return nil if items.empty?

  accumulator = initial
  for item in items do
    accumulator = operation.call(accumulator, item)
  end
  accumulator
end

And now our reduce handles empty lists nicely, or at the least, a little more sanely.

reduce(1, [], lambda{|accum, item| accum * item})
# => nil

With all of that, we have refactored our code into something close to Ruby’s Enumerable#select, expect that we return nil if the enumerable is empty, instead of the initial value for the accumulator.

–Proctor

Erlang Thursday – ETS Introduction Part 5: keypos, compressed, read_conncurrency, and write_concurrency

Today’s Erlang Thursday continues the introduction to ETS and picks up with the promise from last week, and looks at the keypos ETS table setting, and the Tweaks that can be set.

First, we will take a look at the keypos setting.

The keypos is the 1-based index in the tuple to be stored that will be used as the key for the entry. If you remember from the part 3 of the introduction to ETS about the different table types, they use this index for their key comparison to determine if this is a unique item or not.

If we create a new table without specifying the keypos option, it defaults to 1.

Table = ets:new(some_name, []).
% 20498
ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.50.0>},
%  {heir,none},
%  {name,some_name},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

To show the keypos in action, we will create a couple of items to insert into our ETS table so we can see the keypos in action.

Item1 = {1, a}.
% {1,a}
Item2 = {1.0, "a"}.
% {1.0,"a"}
Item3 = {1, "one"}.
% {1,"one"}
Item4 = {a, "a"}.
% {a,"a"}
Item5 = {"a", a}.
% {"a",a}

In the items above, we have some duplicate entries across both the first item and the second item in the two-tuples.

We will go ahead and insert each one of these items in turn, keeping in mind that this table is a set, so any new insert with the same key, will override the previous value for the same key.

ets:insert(Table, Item1).
% true
ets:tab2list(Table).
% [{1,a}]
ets:insert(Table, Item2).
% true
ets:tab2list(Table).
% [{1,a},{1.0,"a"}]
ets:insert(Table, Item3).
% true
ets:tab2list(Table).
% [{1,"one"},{1.0,"a"}]
ets:insert(Table, Item4).
% true
ets:tab2list(Table).
% [{1,"one"},{a,"a"},{1.0,"a"}]
ets:insert(Table, Item5).
% true
ets:tab2list(Table).
% [{"a",a},{1,"one"},{a,"a"},{1.0,"a"}]

When we added Item3 above, it replaced Item1 in the table, since they both have a 1 for the first element in their two-tuple.

We will now create a new table with a keypos of 2, and see how the exact same steps of inserting is changed with a different keypos value.

KeyPosTwo = ets:new(key_pos_2, [{keypos, 2}]).
% 24595
ets:insert(KeyPosTwo, Item1).
% true
ets:tab2list(KeyPosTwo).
% [{1,a}]
ets:insert(KeyPosTwo, Item2).
% true
ets:tab2list(KeyPosTwo).
% [{1.0,"a"},{1,a}]
ets:insert(KeyPosTwo, Item3).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{1.0,"a"},{1,a}]
ets:insert(KeyPosTwo, Item4).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{a,"a"},{1,a}]
ets:insert(KeyPosTwo, Item5).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{a,"a"},{"a",a}]

In this case, it wasn’t until we added Item4 that we had an override, as both Item2 and Item4 both have an "a" as their second item. Then we we add Item5 it overwrites the Item1, as they both have the atom a as their second element.

And if we set a keypos of some value, say three, and we try to insert a tuple that has fewer items, we will get an exception of type bad argument.

KeyPosThree = ets:new(key_pos_3, [{keypos, 3}]).
% 28692
ets:insert(KeyPosThree, Item1).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(28692,{1,a})

Now it is time to look at the compressed option when creating a table.

When creating a new table, the default setting is for it to be uncompressed, as we can see in the table info since it shows {compressed,false}.

UncompressedTable = ets:new(uc, []).
% 32786
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,uc},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

We create a new table, with the compressed option, and when we look at ets:info/1 for the table, we see that it show {compressed,true}.

CompressedTable = ets:new(uc, [compressed]).
% 45074
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,uc},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

compressed, according to the documentation at least, says that it stores the data in a “more compact format to consume less memory”. It also warns that this can this can make operations that need to check the entire tuple slower, and that the key is not stored compressed, at least in the current implementation.

So let’s see what kind of memory difference compressed makes.

To start with, we will insert 100_000 items into our ETS tables and see what the resulting memory size becomes. We will insert a new tuple of {X, X}, for all numbers from 1 to 100_000.

lists:foreach(fun(X) -> ets:insert(CompressedTable, {X, X}) end,
              lists:seq(1, 100000)).
% ok
lists:foreach(fun(X) -> ets:insert(UncompressedTable, {X, X}) end,
              lists:seq(1, 100000)).
% ok
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,714643},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,814643},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Interesting.

For the compressed table the memory is reported to be 814643, but the uncompressed shows the memory to be less than that with 714643.

Maybe it doesn’t like to compact integer values very much, so let’s do the same thing, but use a string for the second item in the tuple.

lists:foreach(fun(X) -> ets:insert(UncompressedTable, {X, integer_to_list(X)}) end,
              lists:seq(1, 100000)).
% ok
lists:foreach(fun(X) -> ets:insert(CompressedTable, {X, integer_to_list(X)}) end, 
              lists:seq(1, 100000)).
% ok
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,914644},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,1692433},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Now using strings in our tuples instead of just using integers, we can see that the compressed ETS table memory is 914644, where as the uncompressed ETS table’s memory is 1692433.

So in addition to thinking about the way you are going to be matching on the data when trying to determine if the table should be compressed, it looks like you also need to think about the type of data you are going to be putting into the ETS table.

The last two options to be discussed are read_concurrency and write_concurrency.

read_conccurency is by default set to false, and, according to the documentation is best for when “read operations are much more frequent than write operations, or when concurrent reads and writes comes in large read and write bursts”.

So if you have a table that has a bunch of reads with the writes infrequently interspersed between the reads, this would be when you would want to enable read_concurrency, as the documentation states that switching between reads and writes is more expensive.

The write_concurrency option is set to false by default, causing any additional concurrent writes to block while an write operation is proceeding. When set to true different tuples of the same table can be written to by concurrent processes, and does not affect any table of the type ordered_set.

This should be it as far as the introduction goes. Next week we will start looking at the different operations we can perform using ETS and ETS tables.

–Proctor

Ruby Tuesday – Refactoring Towards Creating map

Today’s Ruby Tuesday continues from where we left off with last week’s look at refactoring to filter.

For reference, we had a User class,

require 'date'

class User
  attr_reader :name, :date_of_birth, :date_of_death, :languages_created

  def initialize(name:, 
                 is_active:,
                 date_of_birth: nil,
                 date_of_death: nil,
                 languages_created: [])
    @name = name
    @is_active = is_active
    @date_of_birth = date_of_birth
    @date_of_death = date_of_death
    @languages_created = languages_created
  end

  def active?
    @is_active
  end

  def to_s
    inspect
  end
end

a list of User objects,

alan_kay = User.new(name: "Alan Kay",
                    is_active: true,
                    date_of_birth: Date.new(1940, 5, 17),
                    languages_created: ["Smalltalk", "Squeak"])
john_mccarthy = User.new(name: "John McCarthy",
                         is_active: true,
                         date_of_birth: Date.new(1927, 9, 4),
                         date_of_death: Date.new(2011, 10, 24),
                         languages_created: ["Lisp"])
robert_virding = User.new(name: "Robert Virding",
                          is_active: true,
                          languages_created: ["Erlang", "LFE"])
dennis_ritchie = User.new(name: "Dennis Ritchie",
                          is_active: true,
                          date_of_birth: Date.new(1941, 9, 9),
                          date_of_death: Date.new(2011, 10, 12),
                          languages_created: ["C"])
james_gosling = User.new(name: "James Gosling",
                         is_active: true,
                         date_of_birth: Date.new(1955, 5, 19),
                         languages_created: ["Java"])
matz = User.new(name: "Yukihiro Matsumoto",
                is_active: true,
                date_of_birth: Date.new(1965, 4, 14),
                languages_created: ["Ruby"])
nobody = User.new(name: "",
                  is_active: false)

users = [alan_kay, john_mccarthy, robert_virding, 
         dennis_ritchie, james_gosling, matz, nobody]

and a helper method to get the list of names for a list of Users.

def get_names_for(users)
  names = []
  for user in users do
    names << user.name
  end
  names
end

get_names_for(users)
=> ["Alan Kay", "John McCarthy", "Robert Virding",
    "Dennis Ritchie", "James Gosling", "Yukihiro Matsumoto", ""]

Elsewhere in our (imaginary, but based off real events with names changed to protect the innocent) code base, we have some logic to get a listing of languages created by the users.

def get_languages(users)
  languages = []
  for user in users do
    languages << user.languages_created
  end
  languages
end

get_languages(users)
# => [["Smalltalk", "Squeak"], ["Lisp"],
      ["Erlang", "LFE"], ["C"], ["Java"], ["Ruby"], []]

And yet somewhere else, there is logic to get a listing of the years different users were born.

def get_birth_years(users)
  birth_years = []
  for user in users do
    birth_years << (user.date_of_birth ? user.date_of_birth.year : nil)
  end
  birth_years
end

get_birth_years(users)
# => [1940, 1927, nil, 1941, 1955, 1965, nil]

As with the filter we looked at last week, we have quite a bit of duplication of logic in all of these methods.

If we turn our head and squint a little, we can see the methods all look something like this:

def transform_to(items)
  results = []
  for item in items do
    results << do_some_transformation(item)
  end
  results
end

This method:

takes a list of items to iterate over
creates a working result set
iterates over every item in the items given and for each item
- some transformation of the item into a new value is computed and
- the result is added to the working results set
the end results are returned

The only thing that is different between each of the functions above, once we have rationalized the variable names, is the transformation to be done on each item in the list.

And this transformation that is the different part is just calling a function on that item, also called map in Mathematics, which Wolfram Alpha defines as:

So we will “map” over all of the items to get a new list of items, which makes our generic function look like the following, after we update names to match our new terminology.

def map(items)
  results = []
  for item in items do
    results << do_map(item)
  end
  results
end

This is starting to come together, but we still don’t have anything specific for what do_map represents yet.

We will follow our previous example in filter and make the generic function we want to call a anonymous function, specifically a lambda in Ruby, and pass that in to our map method.

def map(items, do_map)
  results = []
  for item in items do
    results << do_map.call(item)
  end
  results
end

Time to test it out by using our previous calls and making the specifics a lambda.

map(users, lambda{|user| user.languages_created})
# => [["Smalltalk", "Squeak"], ["Lisp"],
      ["Erlang", "LFE"], ["C"], ["Java"], ["Ruby"], []]
map(users, lambda{|user| user.name})
# => ["Alan Kay", "John McCarthy", "Robert Virding",
      "Dennis Ritchie", "James Gosling", "Yukihiro Matsumoto", ""]
map(users, lambda{|user| user.date_of_birth ? user.date_of_birth.year : nil})
# => [1940, 1927, nil, 1941, 1955, 1965, nil]

And to test if we did get this to be generic enough to work against lists of other types, we’ll do some conversions from characters to Integers, Integers to characters, and cube some integers.

map( ("a".."z"), lambda{|char| char.ord})
# => [97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
      107, 108, 109, 110, 111, 112, 113, 114, 115,
      116, 117, 118, 119, 120, 121, 122]
map((65..90), lambda{|ascii_value| ascii_value.chr})
# => ["A", "B", "C", "D", "E", "F", "G", "H", "I",
      "J", "K", "L", "M", "N", "O", "P", "Q", "R",
      "S", "T", "U", "V", "W", "X", "Y", "Z"]
map((1..7), lambda{|i| i*i*i})
# => [1, 8, 27, 64, 125, 216, 343]

So like last week’s post, where we were able to genericize the logic about conditionally plucking out items from a list based of some condition, we were able to genericize the transformation of a list of one set of values into a list of another set of values.

Which if you are familiar to Ruby, you will likely recognize as Enumerable#map, a.k.a. Enumerable#select, but now you have seen how you could have went down the road to creating your own, if Ruby hadn’t already provided it for you.

-Proctor

Erlang Thursday – ETS Introduction Part 4: ETS Access Protections

Today’s Erlang Thursday continues the introduction to ETS and takes a look at the different access levels that ETS supports.

The different access levels that ETS supports are: public, protected, and private.

Each of these different types can be passed in when creating a new ETS table, but let’s see what type of ETS table we get when we don’t specify an access level.

Table = ets:new(some_name, []).
% 20501
ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,some_name},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

So the default access level is protected when not specified.

So what does it mean for a ETS table to be protected then? The documentation states that protected tables can be written to by only the owning process, but read by other processes.

So let’s see that at work then.

First let’s create a process that we can give ETS tables away to.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.58.0>

We create a new ETS table and specify it is protected, and we also specify that it is a named_table as a bonus.

ProtectedNamedETS = ets:new(protected_named_ets, [protected, named_table]).
% protected_named_ets

The result of that match is protected_named_ets and not a number like the call to ets:new/2 above, so we should be able to use the name of the table to access the table instead of just the identifier.

We will insert an entry into the ETS table, and we will use the name of the ETS table as the ETS table reference since we said the table is a named_table.

ets:insert(protected_named_ets, {foobar, baz}).
% true

ets:insert/2 returned true so we should now have some data in the table. Let’s pull it out using ets:match/2, and let’s match everything while we are at it by using a $1 for the pattern.

ets:match(protected_named_ets, '$1').
% [[{foobar,baz}]]

So as the owner process of the ETS table, since this was the process that created it, we can read an write to the table.

Now time to give our table away.

ets:give_away(protected_named_ets, SomeProcess, []).
% true

Since the documentation says is is available for reads, we will do the same match we just did before giving it away.

ets:match(protected_named_ets, '$1').
% [[{foobar,baz}]]

We get our results back.

What does a write look like then, since it says only the owning process has access to write, and the return value of calling ets:insert/2 is always true.

ets:insert(protected_named_ets, {barbaz, foo}).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(protected_named_ets,{barbaz,foo})

An exception, and it is of type bad argument, which does hold that it doesn’t allow writes from non-owning processes, but doesn’t exactly make it clear that is what is happening.

How about if we see what we get if we try to call ets:insert/2 on a table that doesn’t exist?

ets:insert(no_such_table, {foo, bar}).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(no_such_table,{foo,bar})

Same exception and same format of the error with just the name of the table and the tuple being different.

Thinking about this some, it does make sense that these two difference cases would be the same error. As far as the inserting process knows, there is no such table when trying to do an insert if no table exists, or if it is set to be protected. Either way, the caller passed in a bad ETS table reference for the call to ets:insert/2.

So we have now seen how protected behaves, which is the default access level, so let’s take a look at public next.

PublicNamedETS = ets:new(public_named_ets, [public, named_table]).
% public_named_ets

We will do an insert and a match from our current process, which is the owner.

ets:insert(public_named_ets, {foo, bar}).
% true
ets:match(public_named_ets, '$1').
% [[{foo,bar}]]

All looks good there.

The documentation states that public allows any process to read from and write to the table, so let’s give the public table away to SomeProcess and try to read and write.

ets:give_away(public_named_ets, SomeProcess, []).
% true

Now that we have given it away, time to try to add a new entry to the table, and see if we can read that write back out.

ets:insert(public_named_ets, {bar, baz}).
% true
ets:match(public_named_ets, '$1').
% [[{foo,bar}],[{bar,baz}]]

There we go. We have just inserted new data into that table, and when we do the ets:match/2 on everything, we see the new data in the result.

Now let’s create a private table. The documentation states that for private ETS tables, only the owner is allowed to read or write to the ETS table.

PrivateNamedETS = ets:new(private_named_ets, [private, named_table]).
private_named_ets

Again, while this process still owns the table, we will add an item and do a read from the table.

ets:insert(private_named_ets, {fizz, buzz}).
% true
ets:match(private_named_ets, '$1').
% [[{fizz,buzz}]]

Time to give this table away to SomeProcess again.

ets:give_away(private_named_ets, SomeProcess, []).
% true

Now that the ETS table is owned by a different process, time to try a read.

ets:match(private_named_ets, '$1').
% ** exception error: bad argument
%      in function  ets:match/2
%         called as ets:match(private_named_ets,'$1')

bad argument exception, just like the attempted ets:insert/2 we tried on the protected ETS table above when it was owned by a different process.

And time for a write.

ets:insert(private_named_ets, {buzz, fizz}).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(private_named_ets,{buzz,fizz})

A bad argument exception here as well, which should not be a surprise at this point, as both the protected write, and this private read both raised that same exception.

So in total, for this introduction so far, we have seen the Type, Access, Named Table, Heir, and Owner settings of an ETS table, and how they relate.

Next week, we will conclude the introduction of ETS by going over the Key Position option and the Tweaks that an ETS table can take when being setup.

–Proctor

Ruby Tuesday – Refactoring towards creating filter

Today’s Ruby Tuesday takes a look at the concept of filter, a.k.a. select in Ruby, and how we could create our own version of it through some refactoring.

Filter is a function/method that can really start to change the way you think about your programs, and start helping you to take advantage of smaller building blocks that compose, or assemble, together to create nice reusable pieces of code.

To get an understanding of when and where filter can be powerful, and how you could create filter on your own if not already given to you as Enumerable#select, we’ll look at some “typical” style code that would look like something you are likely to have encountered in your current code base, or past code bases.

For this guide, we have a User class, and we will require 'date' since we want the user to have a date of birth, and a date of death, since we will be using some historical figures in the world of Computer Science.

require 'date'

class User
  attr_reader :name, :date_of_birth, :date_of_death, :languages_created

  def initialize(name:, is_active:, date_of_birth: nil,
                 date_of_death: nil, languages_created: [])
    @name = name
    @is_active = is_active
    @date_of_birth = date_of_birth
    @date_of_death = date_of_death
    @languages_created = languages_created
  end

  def active?
    @is_active
  end

  def to_s
    inspect
  end
end

We create add some User objects of creators of various programming languages, and add them to an Array of Users.

alan_kay = User.new(name: "Alan Kay",
                    is_active: true,
                    date_of_birth: Date.new(1940, 5, 17),
                    languages_created: ["Smalltalk", "Squeak"])
john_mccarthy = User.new(name: "John McCarthy",
                         is_active: true,
                         date_of_birth: Date.new(1927, 9, 4),
                         date_of_death: Date.new(2011, 10, 24),
                         languages_created: ["Lisp"])
robert_virding = User.new(name: "Robert Virding",
                          is_active: true,
                          languages_created: ["Erlang", "LFE"])
dennis_ritchie = User.new(name: "Dennis Ritchie",
                          is_active: true,
                          date_of_birth: Date.new(1941, 9, 9),
                          date_of_death: Date.new(2011, 10, 12),
                          languages_created: ["C"])
james_gosling = User.new(name: "James Gosling",
                         is_active: true,
                         date_of_birth: Date.new(1955, 5, 19),
                         languages_created: ["Java"])
matz = User.new(name: "Yukihiro Matsumoto",
                is_active: true,
                date_of_birth: Date.new(1965, 4, 14),
                languages_created: ["Ruby"])
nobody = User.new(name: "",
                  is_active: false)

users = [alan_kay, john_mccarthy, robert_virding, 
         dennis_ritchie, james_gosling, matz, nobody]

For most of our cases, we will want an easy way to see what users we have as a result of some operation, so lets define a method that returns a list of just the names for a given list of users.

def get_names_for(users)
  names = []
  for user in users do
    names << user.name
  end
  names
end

So somewhere in our code base we have an area of code that wants to get only the active users from a given list of User objects.

We do our standard for loop, as we would do in so many languages, and we have an if clause that checks the active? method on a User. We do some other processing on that list which we will represent as putsing out the names of the result.

active_users = []
for user in users do
  if (user.active?)
    active_users << user
  end
end

puts "\n\nThe active users' names are:..."
puts get_names_for(active_users)
# Alan Kay
# John McCarthy
# Robert Virding
# Dennis Ritchie
# James Gosling
# Yukihiro Matsumoto
# => nil

Somewhere else in our code base, we have something that wants a list of the language creators that are still alive, because wouldn’t it be cool that we might happen to have the chance to get to have lunch with them during a conference.

alive_users = []
for user in users do
  if (not user.date_of_death)
    alive_users << user
  end
end

puts "\n\nThe alive users' names are:..."
puts get_names_for(alive_users)
# Alan Kay
# Robert Virding
# James Gosling
# Yukihiro Matsumoto
# 
# => nil

And again, the puts just represents some processing of that list.

Yet somewhere else, we have some code that looks for those people that we know to have created more than one programming language.

users_created_more_than_one_language = []
for user in users do
  if (user.languages_created.count > 1)
    users_created_more_than_one_language << user
  end
end

puts "\n\nThe names for users who have created more than one language:..."
puts get_names_for(users_created_more_than_one_language)
# Alan Kay
# Robert Virding
# => nil

If we take a look at our three segments of code above, after a while, if you haven’t already, you will start to notice that they all are very, very similar.

They all:
– create an empty array and assign it to a variable that represents the working list of items that meet some condition,
– iterate over all the items in the list of users
– for each item, it checks some condition,
– add the user to the working copy variable if the condition is true
– return the working copy of items that meet the condition.

If we renamed the working variable to be the same name, the only thing that would be different in the code segments, is the conditional that is checked as part of the if clause.

matching = []
for user in users do
  if (something_specific_goes_here)
    matching << user
  end
end
matching

For a number of languages, you would have to live with that duplication, but this is Ruby, so we can use an escape hatch to make this code more generic and abstract.

The only thing that is different, is the conditional, a.k.a. the predicate. The term predicate signifies a method, or function, that returns a boolean result.

So if we want to abstract out the filtering out of items that match some predicate condition, we can use a lambda, or Proc, call that predicate passing in the User to see if we get a true returned.

def filter(users, predicate)
  matching = []
  for user in users do
    if (predicate.call(user))
      matching << user
    end
  end
  matching
end

We now have something that looks like we can re-use it elsewhere for a list of Users.

So let’s test it out, by redoing the previous checks to use the new filter method we just defined.

First we will call our new filter, and pass it a lambda that looks at the count of the languages that user created.

puts "\n\nThe names for users who have created more than one language (using `filter` method):..."
multi_language_creators = filter(users, lambda{|u| u.languages_created.count > 1 })
puts get_names_for(multi_language_creators)
# Alan Kay
# Robert Virding
# => nil

Looks to be the same as the previous version.

Let’s try it with finding those that don’t have a known date of death.

puts "\n\nThe names for users who are not dead (using `filter` method):..."
not_dead_users = filter(users, lambda{|u| not u.date_of_death})
puts get_names_for(not_dead_users)
# Alan Kay
# Robert Virding
# James Gosling
# Yukihiro Matsumoto
# 
# => nil

So far, so good.

Finally, we try it for active users.

puts "\n\nThe names for users who are active (using `filter` method):..."
filtered_active_users = filter(users, lambda{|u| u.active?})
puts get_names_for(filtered_active_users)
# Alan Kay
# John McCarthy
# Robert Virding
# Dennis Ritchie
# James Gosling
# Yukihiro Matsumoto
# => nil

Yay! We have extracted a common pattern of our code out into something that represents a higher abstraction of filtering out users with a certain condition from a list of User objects.

Not only that, we have separated the concern of iterating over items and checking each item, from the concern of the actual condition we care about.

This seems pretty useful, and something that would apply beyond just a list of User objects.

Let’s see if we can do this for some Array of numbers as well.

Integers in Ruby have an even? method that we can use to know if a number is even.

puts "is 1 even???"
puts 1.even?

To get the even numbers from an Array of numbers, we have some code that looks very familiar.

even_numbers = []
for i in [1, 2, 3, 4, 5, 6, 7] do
  if (i.even?)
    even_numbers << i
  end
end

Let’s try out our new filter method, and see if we can use it on a list of numbers, and only get back those that are even.

puts "\n\nEven numbers"
evens = filter([1, 2, 3, 4, 5, 6, 7], lambda{|i| i.even?})
puts evens
# 2
# 4
# 6
# => nil

That works!!! Let the celebration commence!!!

Well, let it commence after we clean up our filter method to let it represent that it is for more than just a list of Users.

def filter(items, predicate)
  matching = []
  for item in items do
    if (predicate.call(item))
      matching << item
    end
  end
  matching
end

Instead of users, we change it to be items, and an individual item instead of user in that list we loop through.

We can also use our filter method against Ranges and Hashes.

puts "\n\nOdd numbers"
odds = filter((1..7), lambda{|i| i.odd?})
puts odds
# 1
# 3
# 5
# 7
# => nil
puts filter({1 => :a, 2 => :b, 3 => :c, 4 => :d}, lambda{|(key, value)| key.even?}).inspect
# [[2, :b], [4, :d]]
# => nil

So by taking advantage of lambdas, Procs, or even blocks in Ruby, we have been able to extract out a recurring pattern in our code and give it a name.

Not only that, but saw how we could write the start of one do it ourselves, and with some work, we could get it to return a proper hash instead of a list of key-value lists.

-Proctor

Erlang Thursday – ETS Introduction Part 3: ETS Table Types

Today’s Erlang Thursday continues the introduction to ETS and takes a look at the different types of storage strategies that ETS supports.

The different types that ETS supports are: set, ordered_set, bag, and duplicate bag.

Each of these different types can be passed in when creating a new ETS table, but let’s see what type of ETS table we get when we don’t specify any of the types.

ETS_Empty = ets:new(ets_empty, []).
% 36886
ets:info(ETS_Empty).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.50.0>},
%  {heir,none},
%  {name,ets_empty},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

If we look above, we can see the type tagged tuple has the type of set.

To see how the different types can work we will create three tuples to add to the ETS tables of the different type to see what they store.

Item1 = {1, a}.
% {1,a}
Item2 = {1.0, "a"}.
% {1.0,"a"}
Item3 = {1, "one"}.
% {1,"one"}

We will have a two tuples with the first element of 1, and one tuple whose first element is 1.0 to see how the different types of ETS tables behave when given the “same” key.

Why have key of both 1 and 1.0? Because depending on the comparison of equality used, they may or may not be seen as the same, and therefore the same key.

1 == 1.0.
% true
1 =:= 1.0.
% false

First we will take a look at an ETS table of type set.

ETS_Set = ets:new(ets_set, [set]).
40978

We insert Item1 followed by an insert of Item2, and use ets:tab2list/1 to see what is stored in the ETS table.

ets:insert(ETS_Set, Item1).
% true
ets:insert(ETS_Set, Item2).
% true
ets:tab2list(ETS_Set).
% [{1,a},{1.0,"a"}]

An ETS table of type set sees 1 and 1.0 as different keys. So now let’s add Item3 and see what happens when we do an insert with an already existing key.

ets:insert(ETS_Set, Item3).
% true
ets:tab2list(ETS_Set).
% [{1,"one"},{1.0,"a"}]

The previous tuple with the key of 1 was replaced by the tuple for Item3 which is the last thing we inserted.

Let’s look at what an ordered_set does.

ETS_OrdSet = ets:new(ets_ordset, [ordered_set]).
% 45075

Again we’ll insert Item1 followed by Item2 and use ets:tab2list/1 to check it’s state.

ets:insert(ETS_OrdSet, Item1).
% true
ets:insert(ETS_OrdSet, Item2).
% true
ets:tab2list(ETS_OrdSet).
% [{1.0,"a"}]

In this case, the key of 1.0 was seen the same as the previous 1 that was in there, so it overwrites the first item inserted.

We insert Item3 to the ordered_set, and we can see it gets replaced yet again.

ets:insert(ETS_OrdSet, Item3).
% true
ets:tab2list(ETS_OrdSet).
% [{1,"one"}]

Now lets check an ETS table that is a bag.

ETS_Bag = ets:new(ets_bag, [bag]).
% 49172

And we yet again add Item1 and Item2 to the table.

ets:insert(ETS_Bag, Item1).
% true
ets:insert(ETS_Bag, Item2).
% true
ets:tab2list(ETS_Bag).
% [{1,a},{1.0,"a"}]

Looking at ets:tab2list/1, we can see that for a bag they are treated as two different items.

And again we will see what happens when we insert Item3 into this ETS table.

ets:insert(ETS_Bag, Item3).
% true
ets:tab2list(ETS_Bag).
% [{1,a},{1,"one"},{1.0,"a"}]

In the case of a bag type of ETS table, we have Item2 along with entries Item1 and Item3 even though Item1 and Item3 both have the same key.

The last type of ETS table we have is a duplicate_bag.

ETS_DupBag = ets:new(ets_dupbag, [duplicate_bag]).
% 53269

We insert Item1 followed by Item2 as we did with all of the other types of ETS tables.

ets:insert(ETS_DupBag, Item1).
% true
ets:insert(ETS_DupBag, Item2).
% true
ets:tab2list(ETS_DupBag).
% [{1,a},{1.0,"a"}]

And like all of the other ETS table types, we insert Item3 into the duplicate_bag ETS table type.

ets:insert(ETS_DupBag, Item3).
% true
ets:tab2list(ETS_DupBag).
% [{1,a},{1,"one"},{1.0,"a"}]

And we see we have all three items in the ETS table for a duplicate_bag type.o

If we look at the behavior of bag and duplicate_bag though, we see that the behavior of both seems to be the same.

So what is the difference between the two???

If you dig into the documentation, and look at the description of the types under ets:new/2, it says that a bag will allow duplicate keys, but allow the item to only be added once, a duplicate_bag will allow multiple entries even if they have the same values as well.

To see this in action, we will add Item1 to both the ETS_Bag table and the ETS_DupBag table and see what happens.

First with just the ETs bag type.

ets:insert(ETS_Bag, Item1).
% true
ets:tab2list(ETS_Bag).
% [{1,a},{1,"one"},{1.0,"a"}]

The return value is the same as it was before, so adding an item that is already in a ETS table of type bag will not add it again.

So what does the duplicate_bag type of ETS table do?

ets:insert(ETS_DupBag, Item1).
% true
ets:tab2list(ETS_DupBag).
% [{1,a},{1,"one"},{1,a},{1.0,"a"}]

And we can see the tuple {1, a} shows up twice, because we called ets:insert/2 with that value twice.

–Proctor