Ruby Tuesday – Refactoring Towards Creating reduce

As we continue with the theme we have been pursuing in the last couple of posts, we will look at refactoring to reduce, and will take a look at how we can use this with what we have built on from previous posts.

Again for reference, here is our setup data of a User object

require 'date'

class User
  attr_reader :name, :date_of_birth, :date_of_death, :languages_created

  def initialize(name:,
                 is_active:,
                 date_of_birth: nil,
                 date_of_death: nil,
                 languages_created: [])
    @name = name
    @is_active = is_active
    @date_of_birth = date_of_birth
    @date_of_death = date_of_death
    @languages_created = languages_created
  end

  def active?
    @is_active
  end

  def to_s
    inspect
  end
end

and our list of users.

alan_kay = User.new(name: "Alan Kay",
                    is_active: true,
                    date_of_birth: Date.new(1940, 5, 17),
                    languages_created: ["Smalltalk", "Squeak"])
john_mccarthy = User.new(name: "John McCarthy",
                         is_active: true,
                         date_of_birth: Date.new(1927, 9, 4),
                         date_of_death: Date.new(2011, 10, 24),
                         languages_created: ["Lisp"])
robert_virding = User.new(name: "Robert Virding",
                          is_active: true,
                          languages_created: ["Erlang", "LFE"])
dennis_ritchie = User.new(name: "Dennis Ritchie",
                          is_active: true,
                          date_of_birth: Date.new(1941, 9, 9),
                          date_of_death: Date.new(2011, 10, 12),
                          languages_created: ["C"])
james_gosling = User.new(name: "James Gosling",
                         is_active: true,
                         date_of_birth: Date.new(1955, 5, 19),
                         languages_created: ["Java"])
matz = User.new(name: "Yukihiro Matsumoto",
                is_active: true,
                date_of_birth: Date.new(1965, 4, 14),
                languages_created: ["Ruby"])
nobody = User.new(name: "",
                  is_active: false)

users = [alan_kay, john_mccarthy, robert_virding,
         dennis_ritchie, james_gosling, matz, nobody]

In our theoretical code base we have some code that will find the oldest language creator.

def oldest_language_creator(users)
  oldest = nil
  for user in users do
    next unless user.date_of_death.nil?
    next if user.date_of_birth.nil?
    if (oldest.nil? || oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

oldest_language_creator(users).name
# => "Alan Kay"

That is pretty nasty, so let’s see if we can clean it up some and see what happens.

First, inside the for we have both and if and unless, so let’s refactor the unless to be an if.

def oldest_language_creator(users)
  oldest = nil
  for user in users do
    next if (not user.date_of_death.nil?)
    next if user.date_of_birth.nil?
    if (oldest.nil? || oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

pry(main)> oldest_language_creator(users).name
# => "Alan Kay"

And while we are at it, we will refactor out the conditions in the ifs to give them clarifying names.

def alive?(user)
  user.date_of_death.nil?
end

def has_known_birthday?(user)
  not user.date_of_birth.nil?
end

def oldest_language_creator(users)
  oldest = nil
  for user in users do
    next if not alive?(user)
    next if not has_known_birthday?(user)
    if (oldest.nil? || oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

oldest_language_creator(users).name
# => "Alan Kay"

Still works, and that we now have multiple if in our for loop, we can think back to a couple of posts ago, and realize we have a couple of filters happening in our list, and then some logic around who has the earliest birth date.

So let’s refactor out the filters and see what our method starts to look like.

def oldest_language_creator(users)
  alive_users = filter(users, lambda{|user| alive?(user)})
  with_birthdays = filter(alive_users,
                          lambda{|user| has_known_birthday?(user)})

  oldest = nil
  for user in with_birthdays do
    if (oldest.nil? || oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

oldest_language_creator(users).name
#=> "Alan Kay"

This next refactoring might be a bit of a jump, but for me, I am not too fond with starting out with a nil and having to check that every time, since it will be fixed on the first time around, so let’s clean that up.

def oldest_language_creator(users)
  alive_users = filter(users, lambda{|user| alive?(user)})
  with_birthdays = filter(alive_users, 
                          lambda{|user| has_known_birthday?(user)})

  oldest, *rest = with_birthdays
  for user in rest do
    if (oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

Let’s refactor out our for loop into another method, so we can look at it on its own.

def user_with_earliest_birthday(users)
  oldest, *rest = users
  for user in rest do
    if (oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

def oldest_language_creator(users)
  alive_users = filter(users, lambda{|user| alive?(user)})
  with_birthdays = filter(alive_users,
                          lambda{|user| has_known_birthday?(user)})
  user_with_earliest_birthday(with_birthdays)
end

Now we have a pattern here, and it has been present in our filter and map as well, if you can see it, so let’s see if we can identify it.

def user_with_earliest_birthday(users)
  oldest, *rest = users
  for user in rest do
    if (oldest.date_of_birth > user.date_of_birth)
      oldest = user
    end
  end
  oldest
end

def filter(items, predicate)
  matching = []
  for item in items do
    if (predicate.call(item))
      matching << item
    end
  end
  matching
end

def map(items, do_map)
  results = []
  for item in items do
    results << do_map.call(item)
  end
  results
end

If you haven’t detangled it, the pattern is:
1. We have some initial value,
2. and for every item in a list we do some operation against that item value and the current accumulated value, which results in a new value
3. we return the accumulated value.

With our user_with_earliest_birthday method, the initial accumlated value is the first user, the operation is a compare against that with each item to find the oldest user so far, and we return the oldest user.

With filter, the initial accumulated value is an empty Array, the operation is an append if some critera is met, and the return is the accumlated array for those items that meet that criteria.

With map, the initial accumulated value is an empty Array, the operation is an append of the result of a transformation function on each value, and the return is the accumlated array for the transformed results.

This pattern is called reduce.

So what would this look like generically???

def reduce(initial, items, operation)
  accumulator = initial
  for item in items do
    accumulator = operation.call(accumulator, item)
  end
  accumulator
end

Let’s write our user_with_earliest_birthday using this new reduce then, and consume it in our oldest_language_creator

def oldest_language_creator(users)
  alive_users = filter(users, lambda{|user| alive?(user)})
  with_birthdays = filter(alive_users,
                          lambda{|user| has_known_birthday?(user)})

  reduce(with_birthdays.first, with_birthdays.drop(1),
         lambda do |oldest, user|
           oldest.date_of_birth > user.date_of_birth ? user : oldest
         end)
end

Our accumulator starts with the first user in the list, uses the rest of the list to iterate through, and then will return either the accumulator (oldest_so_far) or the current item (user), which would be assigned to the accumulator value for the next iteration.

So how would we write our map and filter to use this new reduce?

def map(items, do_map)
  reduce([], items,
         lambda do |accumulator, item|
           accumulator.dup << do_map.call(item)
         end)
end

def filter(items, predicate)
  reduce([], items, lambda do |accumulator, item|
    if (predicate.call(item))
      accumulator.dup << item
    else
      accumulator
    end
  end)
end

For our new map, our operation is to call the do_map lambda given to the function, and add the transformed value to a duplicate of the original accumulator. While in these cases, it is not necessary to duplicate the original accumulator, I did so here to mirror that in reduce, we are getting what could be considered a completely new value, such as we have with our oldest_language_creator version that uses reduce.

And for our new filter our operation either returns the original accumulator, or adds the item to a new copy of the accumulated list if the predicate passed to filter returns truth. Again, we could leave out the duplication, but for purity sake, and working out the logic we will keep it in there.

So let’s step through our new filter and see what happens one step at a time with it now using reduce.

filter((1..9), lambda{|item| item.odd?})

If we inline reduce substituting the variables given to filter, it looks like the following.

reduce([], (1..9), lambda do |accumulator, item|
  if (lambda{|item| item.odd?}.call(item))
    accumulator.dup << item
  else
    accumulator
  end
end)

And if we expand the body of reduce, and rename it to filter_odds, we get

def filter_odds()
  accumulator = []
  for item in (1..9) do
    accumulator = lambda do |accumulator, item|
      if (lambda{|item| item.odd?}.call(item))
        accumulator.dup << item
      else
        accumulator
      end
    end.call(accumulator, item)
  end
  accumulator
end

And we inline the calls to the lambda that came is from the predicate

def filter_odds()
  accumulator = []
  for item in (1..9) do
    accumulator = lambda do |accumulator, item|
      if (item.odd?)
        accumulator.dup << item
      else
        accumulator
      end
    end.call(accumulator, item)
  end
  accumulator
end

and inline the lambda for the operation to given to reduce

def filter_odds()
  accumulator = []
  for item in (1..9) do
    accumulator = if (item.odd?)
                    accumulator.dup << item
                  else
                    accumulator
                  end
  end
  accumulator
end

And we can see how through filter and reduce, we get back to something that looks like the orignal filtering out of odd numbers from a list.

And to test out reduce further, let’s add some numbers together.

We will call reduce with our initial accumulated “sum” of 0, the numbers from 1 to 10, and a lambda that adds the two numbers together to produce a new running sum.

reduce(0, (1..10), lambda{|accum, item| accum + item})
# => 55

And we do the same for a reduce that computes the product of a list of numbers.

This time our initial accumulator value is 1 which is the identity operation of multiplication.

reduce(1, (1..10), lambda{|accum, item| accum * item})
# => 3628800

But if we call it with an empty list, we return 1 still

reduce(1, [], lambda{|accum, item| accum * item})
# => 1

So we need to clean up our reduce some to make it more robust in the case of reducing against empty lists.

def reduce(initial, items, operation)
  return nil if items.empty?

  accumulator = initial
  for item in items do
    accumulator = operation.call(accumulator, item)
  end
  accumulator
end

And now our reduce handles empty lists nicely, or at the least, a little more sanely.

reduce(1, [], lambda{|accum, item| accum * item})
# => nil

With all of that, we have refactored our code into something close to Ruby’s Enumerable#select, expect that we return nil if the enumerable is empty, instead of the initial value for the accumulator.

–Proctor

Erlang Thursday – ETS Introduction Part 5: keypos, compressed, read_conncurrency, and write_concurrency

Today’s Erlang Thursday continues the introduction to ETS and picks up with the promise from last week, and looks at the keypos ETS table setting, and the Tweaks that can be set.

First, we will take a look at the keypos setting.

The keypos is the 1-based index in the tuple to be stored that will be used as the key for the entry. If you remember from the part 3 of the introduction to ETS about the different table types, they use this index for their key comparison to determine if this is a unique item or not.

If we create a new table without specifying the keypos option, it defaults to 1.

Table = ets:new(some_name, []).
% 20498
ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.50.0>},
%  {heir,none},
%  {name,some_name},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

To show the keypos in action, we will create a couple of items to insert into our ETS table so we can see the keypos in action.

Item1 = {1, a}.
% {1,a}
Item2 = {1.0, "a"}.
% {1.0,"a"}
Item3 = {1, "one"}.
% {1,"one"}
Item4 = {a, "a"}.
% {a,"a"}
Item5 = {"a", a}.
% {"a",a}

In the items above, we have some duplicate entries across both the first item and the second item in the two-tuples.

We will go ahead and insert each one of these items in turn, keeping in mind that this table is a set, so any new insert with the same key, will override the previous value for the same key.

ets:insert(Table, Item1).
% true
ets:tab2list(Table).
% [{1,a}]
ets:insert(Table, Item2).
% true
ets:tab2list(Table).
% [{1,a},{1.0,"a"}]
ets:insert(Table, Item3).
% true
ets:tab2list(Table).
% [{1,"one"},{1.0,"a"}]
ets:insert(Table, Item4).
% true
ets:tab2list(Table).
% [{1,"one"},{a,"a"},{1.0,"a"}]
ets:insert(Table, Item5).
% true
ets:tab2list(Table).
% [{"a",a},{1,"one"},{a,"a"},{1.0,"a"}]

When we added Item3 above, it replaced Item1 in the table, since they both have a 1 for the first element in their two-tuple.

We will now create a new table with a keypos of 2, and see how the exact same steps of inserting is changed with a different keypos value.

KeyPosTwo = ets:new(key_pos_2, [{keypos, 2}]).
% 24595
ets:insert(KeyPosTwo, Item1).
% true
ets:tab2list(KeyPosTwo).
% [{1,a}]
ets:insert(KeyPosTwo, Item2).
% true
ets:tab2list(KeyPosTwo).
% [{1.0,"a"},{1,a}]
ets:insert(KeyPosTwo, Item3).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{1.0,"a"},{1,a}]
ets:insert(KeyPosTwo, Item4).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{a,"a"},{1,a}]
ets:insert(KeyPosTwo, Item5).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{a,"a"},{"a",a}]

In this case, it wasn’t until we added Item4 that we had an override, as both Item2 and Item4 both have an "a" as their second item. Then we we add Item5 it overwrites the Item1, as they both have the atom a as their second element.

And if we set a keypos of some value, say three, and we try to insert a tuple that has fewer items, we will get an exception of type bad argument.

KeyPosThree = ets:new(key_pos_3, [{keypos, 3}]).
% 28692
ets:insert(KeyPosThree, Item1).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(28692,{1,a})

Now it is time to look at the compressed option when creating a table.

When creating a new table, the default setting is for it to be uncompressed, as we can see in the table info since it shows {compressed,false}.

UncompressedTable = ets:new(uc, []).
% 32786
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,uc},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

We create a new table, with the compressed option, and when we look at ets:info/1 for the table, we see that it show {compressed,true}.

CompressedTable = ets:new(uc, [compressed]).
% 45074
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,uc},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

compressed, according to the documentation at least, says that it stores the data in a “more compact format to consume less memory”. It also warns that this can this can make operations that need to check the entire tuple slower, and that the key is not stored compressed, at least in the current implementation.

So let’s see what kind of memory difference compressed makes.

To start with, we will insert 100_000 items into our ETS tables and see what the resulting memory size becomes. We will insert a new tuple of {X, X}, for all numbers from 1 to 100_000.

lists:foreach(fun(X) -> ets:insert(CompressedTable, {X, X}) end,
              lists:seq(1, 100000)).
% ok
lists:foreach(fun(X) -> ets:insert(UncompressedTable, {X, X}) end,
              lists:seq(1, 100000)).
% ok
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,714643},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,814643},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Interesting.

For the compressed table the memory is reported to be 814643, but the uncompressed shows the memory to be less than that with 714643.

Maybe it doesn’t like to compact integer values very much, so let’s do the same thing, but use a string for the second item in the tuple.

lists:foreach(fun(X) -> ets:insert(UncompressedTable, {X, integer_to_list(X)}) end,
              lists:seq(1, 100000)).
% ok
lists:foreach(fun(X) -> ets:insert(CompressedTable, {X, integer_to_list(X)}) end, 
              lists:seq(1, 100000)).
% ok
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,914644},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,1692433},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Now using strings in our tuples instead of just using integers, we can see that the compressed ETS table memory is 914644, where as the uncompressed ETS table’s memory is 1692433.

So in addition to thinking about the way you are going to be matching on the data when trying to determine if the table should be compressed, it looks like you also need to think about the type of data you are going to be putting into the ETS table.

The last two options to be discussed are read_concurrency and write_concurrency.

read_conccurency is by default set to false, and, according to the documentation is best for when “read operations are much more frequent than write operations, or when concurrent reads and writes comes in large read and write bursts”.

So if you have a table that has a bunch of reads with the writes infrequently interspersed between the reads, this would be when you would want to enable read_concurrency, as the documentation states that switching between reads and writes is more expensive.

The write_concurrency option is set to false by default, causing any additional concurrent writes to block while an write operation is proceeding. When set to true different tuples of the same table can be written to by concurrent processes, and does not affect any table of the type ordered_set.

This should be it as far as the introduction goes. Next week we will start looking at the different operations we can perform using ETS and ETS tables.

–Proctor

Ruby Tuesday – Refactoring Towards Creating map

Today’s Ruby Tuesday continues from where we left off with last week’s look at refactoring to filter.

For reference, we had a User class,

require 'date'

class User
  attr_reader :name, :date_of_birth, :date_of_death, :languages_created

  def initialize(name:, 
                 is_active:,
                 date_of_birth: nil,
                 date_of_death: nil,
                 languages_created: [])
    @name = name
    @is_active = is_active
    @date_of_birth = date_of_birth
    @date_of_death = date_of_death
    @languages_created = languages_created
  end

  def active?
    @is_active
  end

  def to_s
    inspect
  end
end

a list of User objects,

alan_kay = User.new(name: "Alan Kay",
                    is_active: true,
                    date_of_birth: Date.new(1940, 5, 17),
                    languages_created: ["Smalltalk", "Squeak"])
john_mccarthy = User.new(name: "John McCarthy",
                         is_active: true,
                         date_of_birth: Date.new(1927, 9, 4),
                         date_of_death: Date.new(2011, 10, 24),
                         languages_created: ["Lisp"])
robert_virding = User.new(name: "Robert Virding",
                          is_active: true,
                          languages_created: ["Erlang", "LFE"])
dennis_ritchie = User.new(name: "Dennis Ritchie",
                          is_active: true,
                          date_of_birth: Date.new(1941, 9, 9),
                          date_of_death: Date.new(2011, 10, 12),
                          languages_created: ["C"])
james_gosling = User.new(name: "James Gosling",
                         is_active: true,
                         date_of_birth: Date.new(1955, 5, 19),
                         languages_created: ["Java"])
matz = User.new(name: "Yukihiro Matsumoto",
                is_active: true,
                date_of_birth: Date.new(1965, 4, 14),
                languages_created: ["Ruby"])
nobody = User.new(name: "",
                  is_active: false)

users = [alan_kay, john_mccarthy, robert_virding, 
         dennis_ritchie, james_gosling, matz, nobody]

and a helper method to get the list of names for a list of Users.

def get_names_for(users)
  names = []
  for user in users do
    names << user.name
  end
  names
end

get_names_for(users)
=> ["Alan Kay", "John McCarthy", "Robert Virding",
    "Dennis Ritchie", "James Gosling", "Yukihiro Matsumoto", ""]

Elsewhere in our (imaginary, but based off real events with names changed to protect the innocent) code base, we have some logic to get a listing of languages created by the users.

def get_languages(users)
  languages = []
  for user in users do
    languages << user.languages_created
  end
  languages
end

get_languages(users)
# => [["Smalltalk", "Squeak"], ["Lisp"],
      ["Erlang", "LFE"], ["C"], ["Java"], ["Ruby"], []]

And yet somewhere else, there is logic to get a listing of the years different users were born.

def get_birth_years(users)
  birth_years = []
  for user in users do
    birth_years << (user.date_of_birth ? user.date_of_birth.year : nil)
  end
  birth_years
end

get_birth_years(users)
# => [1940, 1927, nil, 1941, 1955, 1965, nil]

As with the filter we looked at last week, we have quite a bit of duplication of logic in all of these methods.

If we turn our head and squint a little, we can see the methods all look something like this:

def transform_to(items)
  results = []
  for item in items do
    results << do_some_transformation(item)
  end
  results
end

This method:

  1. takes a list of items to iterate over
  2. creates a working result set
  3. iterates over every item in the items given and for each item
    • some transformation of the item into a new value is computed and
    • the result is added to the working results set
  4. the end results are returned

The only thing that is different between each of the functions above, once we have rationalized the variable names, is the transformation to be done on each item in the list.

And this transformation that is the different part is just calling a function on that item, also called map in Mathematics, which Wolfram Alpha defines as:

A map is a way of associating unique objects to every element in a given set. So a map f:A|->B from A to B is a function f such that for every a element A, there is a unique object f(a) element B. The terms function and mapping are synonymous for map.

So we will “map” over all of the items to get a new list of items, which makes our generic function look like the following, after we update names to match our new terminology.

def map(items)
  results = []
  for item in items do
    results << do_map(item)
  end
  results
end

This is starting to come together, but we still don’t have anything specific for what do_map represents yet.

We will follow our previous example in filter and make the generic function we want to call a anonymous function, specifically a lambda in Ruby, and pass that in to our map method.

def map(items, do_map)
  results = []
  for item in items do
    results << do_map.call(item)
  end
  results
end

Time to test it out by using our previous calls and making the specifics a lambda.

map(users, lambda{|user| user.languages_created})
# => [["Smalltalk", "Squeak"], ["Lisp"],
      ["Erlang", "LFE"], ["C"], ["Java"], ["Ruby"], []]
map(users, lambda{|user| user.name})
# => ["Alan Kay", "John McCarthy", "Robert Virding",
      "Dennis Ritchie", "James Gosling", "Yukihiro Matsumoto", ""]
map(users, lambda{|user| user.date_of_birth ? user.date_of_birth.year : nil})
# => [1940, 1927, nil, 1941, 1955, 1965, nil]

And to test if we did get this to be generic enough to work against lists of other types, we’ll do some conversions from characters to Integers, Integers to characters, and cube some integers.

map( ("a".."z"), lambda{|char| char.ord})
# => [97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
      107, 108, 109, 110, 111, 112, 113, 114, 115,
      116, 117, 118, 119, 120, 121, 122]
map((65..90), lambda{|ascii_value| ascii_value.chr})
# => ["A", "B", "C", "D", "E", "F", "G", "H", "I",
      "J", "K", "L", "M", "N", "O", "P", "Q", "R",
      "S", "T", "U", "V", "W", "X", "Y", "Z"]
map((1..7), lambda{|i| i*i*i})
# => [1, 8, 27, 64, 125, 216, 343]

So like last week’s post, where we were able to genericize the logic about conditionally plucking out items from a list based of some condition, we were able to genericize the transformation of a list of one set of values into a list of another set of values.

Which if you are familiar to Ruby, you will likely recognize as Enumerable#map, a.k.a. Enumerable#select, but now you have seen how you could have went down the road to creating your own, if Ruby hadn’t already provided it for you.

-Proctor

Erlang Thursday – ETS Introduction Part 4: ETS Access Protections

Today’s Erlang Thursday continues the introduction to ETS and takes a look at the different access levels that ETS supports.

The different access levels that ETS supports are: public, protected, and private.

Each of these different types can be passed in when creating a new ETS table, but let’s see what type of ETS table we get when we don’t specify an access level.

Table = ets:new(some_name, []).
% 20501
ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,some_name},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

So the default access level is protected when not specified.

So what does it mean for a ETS table to be protected then? The documentation states that protected tables can be written to by only the owning process, but read by other processes.

So let’s see that at work then.

First let’s create a process that we can give ETS tables away to.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.58.0>

We create a new ETS table and specify it is protected, and we also specify that it is a named_table as a bonus.

ProtectedNamedETS = ets:new(protected_named_ets, [protected, named_table]).
% protected_named_ets

The result of that match is protected_named_ets and not a number like the call to ets:new/2 above, so we should be able to use the name of the table to access the table instead of just the identifier.

We will insert an entry into the ETS table, and we will use the name of the ETS table as the ETS table reference since we said the table is a named_table.

ets:insert(protected_named_ets, {foobar, baz}).
% true

ets:insert/2 returned true so we should now have some data in the table. Let’s pull it out using ets:match/2, and let’s match everything while we are at it by using a $1 for the pattern.

ets:match(protected_named_ets, '$1').
% [[{foobar,baz}]]

So as the owner process of the ETS table, since this was the process that created it, we can read an write to the table.

Now time to give our table away.

ets:give_away(protected_named_ets, SomeProcess, []).
% true

Since the documentation says is is available for reads, we will do the same match we just did before giving it away.

ets:match(protected_named_ets, '$1').
% [[{foobar,baz}]]

We get our results back.

What does a write look like then, since it says only the owning process has access to write, and the return value of calling ets:insert/2 is always true.

ets:insert(protected_named_ets, {barbaz, foo}).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(protected_named_ets,{barbaz,foo})

An exception, and it is of type bad argument, which does hold that it doesn’t allow writes from non-owning processes, but doesn’t exactly make it clear that is what is happening.

How about if we see what we get if we try to call ets:insert/2 on a table that doesn’t exist?

ets:insert(no_such_table, {foo, bar}).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(no_such_table,{foo,bar})

Same exception and same format of the error with just the name of the table and the tuple being different.

Thinking about this some, it does make sense that these two difference cases would be the same error. As far as the inserting process knows, there is no such table when trying to do an insert if no table exists, or if it is set to be protected. Either way, the caller passed in a bad ETS table reference for the call to ets:insert/2.

So we have now seen how protected behaves, which is the default access level, so let’s take a look at public next.

PublicNamedETS = ets:new(public_named_ets, [public, named_table]).
% public_named_ets

We will do an insert and a match from our current process, which is the owner.

ets:insert(public_named_ets, {foo, bar}).
% true
ets:match(public_named_ets, '$1').
% [[{foo,bar}]]

All looks good there.

The documentation states that public allows any process to read from and write to the table, so let’s give the public table away to SomeProcess and try to read and write.

ets:give_away(public_named_ets, SomeProcess, []).
% true

Now that we have given it away, time to try to add a new entry to the table, and see if we can read that write back out.

ets:insert(public_named_ets, {bar, baz}).
% true
ets:match(public_named_ets, '$1').
% [[{foo,bar}],[{bar,baz}]]

There we go. We have just inserted new data into that table, and when we do the ets:match/2 on everything, we see the new data in the result.

Now let’s create a private table. The documentation states that for private ETS tables, only the owner is allowed to read or write to the ETS table.

PrivateNamedETS = ets:new(private_named_ets, [private, named_table]).
private_named_ets

Again, while this process still owns the table, we will add an item and do a read from the table.

ets:insert(private_named_ets, {fizz, buzz}).
% true
ets:match(private_named_ets, '$1').
% [[{fizz,buzz}]]

Time to give this table away to SomeProcess again.

ets:give_away(private_named_ets, SomeProcess, []).
% true

Now that the ETS table is owned by a different process, time to try a read.

ets:match(private_named_ets, '$1').
% ** exception error: bad argument
%      in function  ets:match/2
%         called as ets:match(private_named_ets,'$1')

bad argument exception, just like the attempted ets:insert/2 we tried on the protected ETS table above when it was owned by a different process.

And time for a write.

ets:insert(private_named_ets, {buzz, fizz}).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(private_named_ets,{buzz,fizz})

A bad argument exception here as well, which should not be a surprise at this point, as both the protected write, and this private read both raised that same exception.

So in total, for this introduction so far, we have seen the Type, Access, Named Table, Heir, and Owner settings of an ETS table, and how they relate.

Next week, we will conclude the introduction of ETS by going over the Key Position option and the Tweaks that an ETS table can take when being setup.

–Proctor

Ruby Tuesday – Refactoring towards creating filter

Today’s Ruby Tuesday takes a look at the concept of filter, a.k.a. select in Ruby, and how we could create our own version of it through some refactoring.

Filter is a function/method that can really start to change the way you think about your programs, and start helping you to take advantage of smaller building blocks that compose, or assemble, together to create nice reusable pieces of code.

To get an understanding of when and where filter can be powerful, and how you could create filter on your own if not already given to you as Enumerable#select, we’ll look at some “typical” style code that would look like something you are likely to have encountered in your current code base, or past code bases.

For this guide, we have a User class, and we will require 'date' since we want the user to have a date of birth, and a date of death, since we will be using some historical figures in the world of Computer Science.

require 'date'

class User
  attr_reader :name, :date_of_birth, :date_of_death, :languages_created

  def initialize(name:, is_active:, date_of_birth: nil,
                 date_of_death: nil, languages_created: [])
    @name = name
    @is_active = is_active
    @date_of_birth = date_of_birth
    @date_of_death = date_of_death
    @languages_created = languages_created
  end

  def active?
    @is_active
  end

  def to_s
    inspect
  end
end

We create add some User objects of creators of various programming languages, and add them to an Array of Users.

alan_kay = User.new(name: "Alan Kay",
                    is_active: true,
                    date_of_birth: Date.new(1940, 5, 17),
                    languages_created: ["Smalltalk", "Squeak"])
john_mccarthy = User.new(name: "John McCarthy",
                         is_active: true,
                         date_of_birth: Date.new(1927, 9, 4),
                         date_of_death: Date.new(2011, 10, 24),
                         languages_created: ["Lisp"])
robert_virding = User.new(name: "Robert Virding",
                          is_active: true,
                          languages_created: ["Erlang", "LFE"])
dennis_ritchie = User.new(name: "Dennis Ritchie",
                          is_active: true,
                          date_of_birth: Date.new(1941, 9, 9),
                          date_of_death: Date.new(2011, 10, 12),
                          languages_created: ["C"])
james_gosling = User.new(name: "James Gosling",
                         is_active: true,
                         date_of_birth: Date.new(1955, 5, 19),
                         languages_created: ["Java"])
matz = User.new(name: "Yukihiro Matsumoto",
                is_active: true,
                date_of_birth: Date.new(1965, 4, 14),
                languages_created: ["Ruby"])
nobody = User.new(name: "",
                  is_active: false)

users = [alan_kay, john_mccarthy, robert_virding, 
         dennis_ritchie, james_gosling, matz, nobody]

For most of our cases, we will want an easy way to see what users we have as a result of some operation, so lets define a method that returns a list of just the names for a given list of users.

def get_names_for(users)
  names = []
  for user in users do
    names << user.name
  end
  names
end

So somewhere in our code base we have an area of code that wants to get only the active users from a given list of User objects.

We do our standard for loop, as we would do in so many languages, and we have an if clause that checks the active? method on a User. We do some other processing on that list which we will represent as putsing out the names of the result.

active_users = []
for user in users do
  if (user.active?)
    active_users << user
  end
end

puts "\n\nThe active users' names are:..."
puts get_names_for(active_users)
# Alan Kay
# John McCarthy
# Robert Virding
# Dennis Ritchie
# James Gosling
# Yukihiro Matsumoto
# => nil

Somewhere else in our code base, we have something that wants a list of the language creators that are still alive, because wouldn’t it be cool that we might happen to have the chance to get to have lunch with them during a conference.

alive_users = []
for user in users do
  if (not user.date_of_death)
    alive_users << user
  end
end

puts "\n\nThe alive users' names are:..."
puts get_names_for(alive_users)
# Alan Kay
# Robert Virding
# James Gosling
# Yukihiro Matsumoto
# 
# => nil

And again, the puts just represents some processing of that list.

Yet somewhere else, we have some code that looks for those people that we know to have created more than one programming language.

users_created_more_than_one_language = []
for user in users do
  if (user.languages_created.count > 1)
    users_created_more_than_one_language << user
  end
end

puts "\n\nThe names for users who have created more than one language:..."
puts get_names_for(users_created_more_than_one_language)
# Alan Kay
# Robert Virding
# => nil

If we take a look at our three segments of code above, after a while, if you haven’t already, you will start to notice that they all are very, very similar.

They all:
– create an empty array and assign it to a variable that represents the working list of items that meet some condition,
– iterate over all the items in the list of users
– for each item, it checks some condition,
– add the user to the working copy variable if the condition is true
– return the working copy of items that meet the condition.

If we renamed the working variable to be the same name, the only thing that would be different in the code segments, is the conditional that is checked as part of the if clause.

matching = []
for user in users do
  if (something_specific_goes_here)
    matching << user
  end
end
matching

For a number of languages, you would have to live with that duplication, but this is Ruby, so we can use an escape hatch to make this code more generic and abstract.

The only thing that is different, is the conditional, a.k.a. the predicate. The term predicate signifies a method, or function, that returns a boolean result.

So if we want to abstract out the filtering out of items that match some predicate condition, we can use a lambda, or Proc, call that predicate passing in the User to see if we get a true returned.

def filter(users, predicate)
  matching = []
  for user in users do
    if (predicate.call(user))
      matching << user
    end
  end
  matching
end

We now have something that looks like we can re-use it elsewhere for a list of Users.

So let’s test it out, by redoing the previous checks to use the new filter method we just defined.

First we will call our new filter, and pass it a lambda that looks at the count of the languages that user created.

puts "\n\nThe names for users who have created more than one language (using `filter` method):..."
multi_language_creators = filter(users, lambda{|u| u.languages_created.count > 1 })
puts get_names_for(multi_language_creators)
# Alan Kay
# Robert Virding
# => nil

Looks to be the same as the previous version.

Let’s try it with finding those that don’t have a known date of death.

puts "\n\nThe names for users who are not dead (using `filter` method):..."
not_dead_users = filter(users, lambda{|u| not u.date_of_death})
puts get_names_for(not_dead_users)
# Alan Kay
# Robert Virding
# James Gosling
# Yukihiro Matsumoto
# 
# => nil

So far, so good.

Finally, we try it for active users.

puts "\n\nThe names for users who are active (using `filter` method):..."
filtered_active_users = filter(users, lambda{|u| u.active?})
puts get_names_for(filtered_active_users)
# Alan Kay
# John McCarthy
# Robert Virding
# Dennis Ritchie
# James Gosling
# Yukihiro Matsumoto
# => nil

Yay! We have extracted a common pattern of our code out into something that represents a higher abstraction of filtering out users with a certain condition from a list of User objects.

Not only that, we have separated the concern of iterating over items and checking each item, from the concern of the actual condition we care about.

This seems pretty useful, and something that would apply beyond just a list of User objects.

Let’s see if we can do this for some Array of numbers as well.

Integers in Ruby have an even? method that we can use to know if a number is even.

puts "is 1 even???"
puts 1.even?

To get the even numbers from an Array of numbers, we have some code that looks very familiar.

even_numbers = []
for i in [1, 2, 3, 4, 5, 6, 7] do
  if (i.even?)
    even_numbers << i
  end
end

Let’s try out our new filter method, and see if we can use it on a list of numbers, and only get back those that are even.

puts "\n\nEven numbers"
evens = filter([1, 2, 3, 4, 5, 6, 7], lambda{|i| i.even?})
puts evens
# 2
# 4
# 6
# => nil

That works!!! Let the celebration commence!!!

Well, let it commence after we clean up our filter method to let it represent that it is for more than just a list of Users.

def filter(items, predicate)
  matching = []
  for item in items do
    if (predicate.call(item))
      matching << item
    end
  end
  matching
end

Instead of users, we change it to be items, and an individual item instead of user in that list we loop through.

We can also use our filter method against Ranges and Hashes.

puts "\n\nOdd numbers"
odds = filter((1..7), lambda{|i| i.odd?})
puts odds
# 1
# 3
# 5
# 7
# => nil
puts filter({1 => :a, 2 => :b, 3 => :c, 4 => :d}, lambda{|(key, value)| key.even?}).inspect
# [[2, :b], [4, :d]]
# => nil

So by taking advantage of lambdas, Procs, or even blocks in Ruby, we have been able to extract out a recurring pattern in our code and give it a name.

Not only that, but saw how we could write the start of one do it ourselves, and with some work, we could get it to return a proper hash instead of a list of key-value lists.

-Proctor

Erlang Thursday – ETS Introduction Part 3: ETS Table Types

Today’s Erlang Thursday continues the introduction to ETS and takes a look at the different types of storage strategies that ETS supports.

The different types that ETS supports are: set, ordered_set, bag, and duplicate bag.

Each of these different types can be passed in when creating a new ETS table, but let’s see what type of ETS table we get when we don’t specify any of the types.

ETS_Empty = ets:new(ets_empty, []).
% 36886
ets:info(ETS_Empty).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.50.0>},
%  {heir,none},
%  {name,ets_empty},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

If we look above, we can see the type tagged tuple has the type of set.

To see how the different types can work we will create three tuples to add to the ETS tables of the different type to see what they store.

Item1 = {1, a}.
% {1,a}
Item2 = {1.0, "a"}.
% {1.0,"a"}
Item3 = {1, "one"}.
% {1,"one"}

We will have a two tuples with the first element of 1, and one tuple whose first element is 1.0 to see how the different types of ETS tables behave when given the “same” key.

Why have key of both 1 and 1.0? Because depending on the comparison of equality used, they may or may not be seen as the same, and therefore the same key.

1 == 1.0.
% true
1 =:= 1.0.
% false

First we will take a look at an ETS table of type set.

ETS_Set = ets:new(ets_set, [set]).
40978

We insert Item1 followed by an insert of Item2, and use ets:tab2list/1 to see what is stored in the ETS table.

ets:insert(ETS_Set, Item1).
% true
ets:insert(ETS_Set, Item2).
% true
ets:tab2list(ETS_Set).
% [{1,a},{1.0,"a"}]

An ETS table of type set sees 1 and 1.0 as different keys. So now let’s add Item3 and see what happens when we do an insert with an already existing key.

ets:insert(ETS_Set, Item3).
% true
ets:tab2list(ETS_Set).
% [{1,"one"},{1.0,"a"}]

The previous tuple with the key of 1 was replaced by the tuple for Item3 which is the last thing we inserted.

Let’s look at what an ordered_set does.

ETS_OrdSet = ets:new(ets_ordset, [ordered_set]).
% 45075

Again we’ll insert Item1 followed by Item2 and use ets:tab2list/1 to check it’s state.

ets:insert(ETS_OrdSet, Item1).
% true
ets:insert(ETS_OrdSet, Item2).
% true
ets:tab2list(ETS_OrdSet).
% [{1.0,"a"}]

In this case, the key of 1.0 was seen the same as the previous 1 that was in there, so it overwrites the first item inserted.

We insert Item3 to the ordered_set, and we can see it gets replaced yet again.

ets:insert(ETS_OrdSet, Item3).
% true
ets:tab2list(ETS_OrdSet).
% [{1,"one"}]

Now lets check an ETS table that is a bag.

ETS_Bag = ets:new(ets_bag, [bag]).
% 49172

And we yet again add Item1 and Item2 to the table.

ets:insert(ETS_Bag, Item1).
% true
ets:insert(ETS_Bag, Item2).
% true
ets:tab2list(ETS_Bag).
% [{1,a},{1.0,"a"}]

Looking at ets:tab2list/1, we can see that for a bag they are treated as two different items.

And again we will see what happens when we insert Item3 into this ETS table.

ets:insert(ETS_Bag, Item3).
% true
ets:tab2list(ETS_Bag).
% [{1,a},{1,"one"},{1.0,"a"}]

In the case of a bag type of ETS table, we have Item2 along with entries Item1 and Item3 even though Item1 and Item3 both have the same key.

The last type of ETS table we have is a duplicate_bag.

ETS_DupBag = ets:new(ets_dupbag, [duplicate_bag]).
% 53269

We insert Item1 followed by Item2 as we did with all of the other types of ETS tables.

ets:insert(ETS_DupBag, Item1).
% true
ets:insert(ETS_DupBag, Item2).
% true
ets:tab2list(ETS_DupBag).
% [{1,a},{1.0,"a"}]

And like all of the other ETS table types, we insert Item3 into the duplicate_bag ETS table type.

ets:insert(ETS_DupBag, Item3).
% true
ets:tab2list(ETS_DupBag).
% [{1,a},{1,"one"},{1.0,"a"}]

And we see we have all three items in the ETS table for a duplicate_bag type.o

If we look at the behavior of bag and duplicate_bag though, we see that the behavior of both seems to be the same.

So what is the difference between the two???

If you dig into the documentation, and look at the description of the types under ets:new/2, it says that a bag will allow duplicate keys, but allow the item to only be added once, a duplicate_bag will allow multiple entries even if they have the same values as well.

To see this in action, we will add Item1 to both the ETS_Bag table and the ETS_DupBag table and see what happens.

First with just the ETs bag type.

ets:insert(ETS_Bag, Item1).
% true
ets:tab2list(ETS_Bag).
% [{1,a},{1,"one"},{1.0,"a"}]

The return value is the same as it was before, so adding an item that is already in a ETS table of type bag will not add it again.

So what does the duplicate_bag type of ETS table do?

ets:insert(ETS_DupBag, Item1).
% true
ets:tab2list(ETS_DupBag).
% [{1,a},{1,"one"},{1,a},{1.0,"a"}]

And we can see the tuple {1, a} shows up twice, because we called ets:insert/2 with that value twice.

–Proctor

Check Your Git Commits for the Year

It’s that time of the year, when you hit the last few weeks of the year and wonder, what did I manage to do this year?

This may be on your own accord of reflecting on the past year, it may be because you have your annual performance review, or maybe even wanting some ammo to use for negotiating a new raise or promotion.

Either way, if you use Git, you can use that journal log of your work to help trigger memories of what you did for the past year.

We will take a look at how to build this up to be generic, so that it can be run any time of any year, across any of your Git branches.

First, we want to find commits that we were the author of. As we can have our author name be different across Git repos we want to look at who we are according to that repo based on our Git config settings.

git config --get user.name
# Proctor

We will put that in a Bash function for nice naming and ease of use later on.

function my_git_user_name() {
  git config --get user.name
}

We also want to know what year it is, so we can look at commits for this year.

We will use the date command to get the current year.

date +'%Y'
# 2015

Again, we will create a Bash function for that as well.

function this_year() {
  date +'%Y'
}

We also want to know what last year was, so we can know the beginning of this year. We bust out bc for this to do some calculation at the command line. We take the current year – 1 and pass that to bc to get last year.

echo "$(this_year)-1" | bc
# 2014

Wrap that in a function.

function last_year() {
  echo "$(this_year)-1" | bc
}

And now we can get the end of last year, being December 31st.

echo "$(last_year)-12-31"
# 2014-12-31

And of course, we put that into another function.

function end_of_last_year() {
  echo "$(last_year)-12-31"
}

And now we can use both end_of_last_year and my_git_user_name to find the Git commits I was the author of since the beginning of the year.

git log --author="$(my_git_user_name)" --after="$(end_of_last_year)" origin/master

Note that this checks against ‘origin/masterso if you call (one of) your canonical remote(s) something other thanorigin` you will need to update this, but this will show all those items that have made it into master that you have worked on.

And because of convenience, we will put this in a function, so we can call in nice and easy.

function my_commits_for_this_past_year()
{
  git log --author="$(my_git_user_name)" --after="$(end_of_last_year)" origin/master
}

And to call it we just need to type “ at the command line.

Having these functions, it also allows us to add it to our Git aliases or .bash_profile so we can have easy access to call it from anywhere.

### What did I do in git this past year?

function this_year() {
  date +'%Y'
}

function last_year() {
  echo "$(this_year)-1" | bc
}

function end_of_last_year() {
  echo "$(last_year)-12-31"
}

function my_git_user_name() {
  git config --get user.name
}

function my_commits_for_this_past_year()
{
  git log --author="$(my_git_user_name)" --after="$(end_of_last_year)" origin/master
}

This makes it nice and easy to filter your Git commits and trace through your history on a project, and refresh your memory of what you have actually touched, instead of misremembering what year it was done, or forgetting about that small little fix that wound up having a big impact.

–Proctor

Ruby Tuesday – Array Decomposition

Today’s Ruby Tuesday takes a look at Array Decomposition.

Ruby gives you some ability to destructure, or decompose, an Array into its component pieces.

Say we have an Array which represents a point in two-dimensional Cartesian space, we can decompose that array into its x and y coordinates, by doing an assignment of x and y to the point represented by the given Array, if we wrap them in parenthesis.

(x, y) = [1, -1]
# => [1, -1]
x
# => 1
y
# => -1

If we want to decompose only part of the Array, and save off anything at the end, we can use a * before the last variable in the assignment.

(_, second, *rest) = [:a, :b, :c, :d]
# => [:a, :b, :c, :d]

The capturing even works, if there are less items in the array than there are variables to decompose into.

(_, second, *rest) = [:a]
# => [:a]
second
# => nil
rest
# => []

We can also use Array Decomposition in method arguments to decompose a passed in Array to individual variable for specific elements.

def cartesian_point((x, y))
  puts "x: #{x}, y: #{y}"
end
# => :cartesian_point

cartesian_point([1, -1])
# x: 1, y: -1
# => nil

Above I mentioned that we can capture the “rest” of the Array we aren’t wanting to decompose at this point by using a * (called the “splat” operator).

(head, *rest) = [1, 2, 3, 4, 5]
# => [1, 2, 3, 4, 5]
head
# => 1
rest
# => [2, 3, 4, 5]

So let’s see what we get when we use the splat to capture the rest of an Array that is smaller than what is at the end.

(first, *tail) = [1]
# => [1]
first
# => 1
tail
# => []

We get an empty list.

Let’s see what we get on the decomposition for an empty list, into the items, and a capturing “rest” variable.

(h, *t) = []
# => []
h
# => nil
t
# => []

We get a nil as part of the binding, and we still get an empty array the “rest” of the items.

This means we can use destructing to handle recursing a list, and have a guard clause that checks if we received and empty Array by checking if the head of the Array is nil, and then just recurse by passing in the tail as the argument when recursing.

def recurse_list((head, *tail))
  if (head != nil)
    puts head
    recurse_list(tail)
  else
    puts "all done"
  end
end
# => :recurse_list

recurse_list([1, 2, 3, 4, 5, 6, 7])
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# all done
# => nil

–Proctor

Erlang Thursday – ETS Introduction, Part 2

Today’s Erlang Thursday continues the introduction to the ets module, and ETS in general.

We saw last time that ETS tables are destroyed when the parent process crashes, so the question comes, how might we be able to keep our ETS tables alive if we just “Let It Crash!”?

To solve this problem, we will take a look at the function ets:give_away/3 and the option of specifying the heir at table construction.

First, we will create a function that will represent a process we can give the table ownership to. This function just does a receive and never times out.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>

And now with that function, we can spawn a process to run that function.

Process = spawn(Fun).
% <0.53.0>

We create a new ETS Table,

Table = ets:new(table, []).
% 20498

and give it away to the process we just spawned.

ets:give_away(Table, Process, []).
% true

We can look at the table info, and see the owner is now the process we spawned as the Pid for the process aligns with the Pid in the owner tuple in the table settings.

ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.53.0>},
%  {heir,none},
%  {name,table},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Now that we have supposedly transferred ownership, time to crash our current process, which is the one that was the original owner before the transfer.

1 = 2.
% ** exception error: no match of right hand side value 2
self().
% <0.58.0>

We check if the process we spawned is still alive, mostly out of showing that there is nothing up our sleeves.

is_process_alive(Process).
% true

And let’s take a look at the “info” for the table again, and see if it is still available.

ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.53.0>},
%  {heir,none},
%  {name,table},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

It is still alive!!! We did transfer ownership, so if our process crashes the ETS table still stays alive.

Time to kill that process

exit(Process, "Because").
% true
is_process_alive(Process).
% false

and watch the ETS table disappear…

ets:info(Table).
% undefined

This time, let’s use the heir option when creating an ETS table, and take advantage of the magic of ownership transfer for an ETS table to a heir.

In this case, the shell will be the heir when the owning process dies.

TableWithHeir = ets:new(table, [{heir, self(), "something went wrong"}]).
% 24594

We create a new process, and assign ownership of the ETS table to the new process.

Process2 = spawn(Fun).
% <0.71.0>
ets:give_away(TableWithHeir, Process2, []).
% true

We then look at the info for the table, and we can see both the owner is the new process, and the heir is our current process.

self().
% <0.58.0>
ets:info(TableWithHeir).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,349},
%  {owner,<0.71.0>},
%  {heir,<0.58.0>},
%  {name,table},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Time to kill the owning process again…

exit(Process2, "Because").
% true
is_process_alive(Process2).
% false

And if we inspect the table info again, we can see the current process is now both the owner and the heir.

ets:info(TableWithHeir).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,349},
%  {owner,<0.58.0>},
%  {heir,<0.58.0>},
%  {name,table},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

We spawn up a new process, and we give the table to that new process.

Process3 = spawn(Fun).
% <0.78.0>
ets:give_away(TableWithHeir, Process3, []).
% true

The owner now becomes that new process, and our current process is still the heir.

ets:info(TableWithHeir).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,349},
%  {owner,<0.78.0>},
%  {heir,<0.58.0>},
%  {name,table},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

So by taking advantage of the ability to specify a heir, and using ets:give_away/3, we can help keep the ETS table alive.

One way this might be taken advantage of is that we have a supervisor create a “heir” process, and then create the child process that would own the ETS table, and if the child dies, it can then transfer ownership back to the heir process until the new “owning” process can be restarted, and then the heir process can then transfer ownership of the ETS table to the “newly restarted” process.

–Proctor

Ruby Tuesday – SSL Version In Ruby

Today’s Ruby Tuesday takes a look at the OpenSSL::SSL::SSLContext#ssl_version.

At work today, I was pulled into a bit of a “fire”, where I was told that one of the sets of services our app at work depends on is going to be removing support for all but TLS 1.2.

Just doing a basic Net::HTTP.get did not get us a result back, and we first had to figure out how to get TLS 1.2 as something that Ruby as the client would say it supports.

Before we can start testing any of this, we need to require openssl and net/http.

require 'openssl'
# => true
require 'net/http'
# => true

It turns out when you don’t specify a version the default version is SSLv23, which can be found by looking at DEFAULT_PARAMS on the SSLContext.

OpenSSL::SSL::SSLContext::DEFAULT_PARAMS[:ssl_version]
=> "SSLv23"

Since TLSv1.2, is a “higher” version than SSLv23, we were getting errors back because we a connection using TLSv1.2 would never be negotiated.

To get the ability to support TLS generically, instead of just hardcoding a specific version, e.g. TLSv1, or TLSv1_2, the ssl_version needs to be set to nil to tell Ruby’s OpenSSL components to use TLS instead of SSL.

This opened up the question of if we set OpenSSL to be TLS instead of SSL, would the TLS Protocol negotiate down to SSL if we happen to have an endpoint we need to talk to that doesn’t currently support TLS, but only SSL.

Playing around with Net::HTTP.start I was able to play with sending HTTPs requests using different settings for the ssl_version.

As I was also testing against a local instance of nginx that would only support SSLv3 using a self-signed certificate, I had the veryfy_mode to VERIFY_NONE for testing. Note that I do NOT recommend this for real use cases.

The first helper method I created doesn’t specify a ssl_version option, so it just uses the default ssl_version setting.

def test_url_no_version(url)
  Net::HTTP.start(url.hostname, nil,
                  use_ssl: url.scheme == "https",
                  verify_mode: OpenSSL::SSL::VERIFY_NONE ) do |http|
    response = http.request(Net::HTTP::Get.new(url))
    puts response.inspect
    response
  end
end
# => :test_url_no_version

The the next helper method I created sets the ssl_version option to nil to allow it to use TLS instead of SSL.

def test_url_ssl_version_is_nil(url)
  Net::HTTP.start(url.hostname, nil,
                  use_ssl: url.scheme == "https",
                  verify_mode: OpenSSL::SSL::VERIFY_NONE,
                  ssl_version: nil ) do |http|
    response = http.request(Net::HTTP::Get.new(url))
    puts response.inspect
    response
  end
end
# => :test_url

The the last helper method I created sets the ssl_version option to :SSLv3, which is the only version the webserver is setup to handle.

def test_url_ssl3(url)
  Net::HTTP.start(url.hostname, nil,
                  use_ssl: url.scheme == "https",
                  verify_mode: OpenSSL::SSL::VERIFY_NONE,
                  ssl_version: :SSLv3 ) do |http|
    response = http.request(Net::HTTP::Get.new(url))
    puts response.inspect
    response
  end
end
# => :test_url_ssl3

Now that we have these, we can test the results of asking the test webserver for a request and see what happens. We will also use Google as a baseline to compare against.

First we will test the verion where the ssl_version is set to nil. This would tell us if it would fall back to try a SSL variant.

test_url_ssl_version_is_nil(URI("https://www.google.com"))
# #<Net::HTTPOK 200 OK readbody=true>
# => #<Net::HTTPOK 200 OK readbody=true>
test_url_ssl_version_is_nil(URI("https://localhost/index.html"))
# OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: unsupported protocol
# from /Users/proctor/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/net/http.rb:923:in `connect'

Google returns successfully, but the local test doesn’t so it doesn’t look to fall back to SSL from TLS.

Next we try with the default setting for ssl_version.

test_url_no_version(URI("https://www.google.com"))
# #<Net::HTTPOK 200 OK readbody=true>
# => #<Net::HTTPOK 200 OK readbody=true>
test_url_no_version(URI("https://localhost/index.html"))
# OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: unsupported protocol
# from /Users/proctor/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/net/http.rb:923:in `connect'

Google still returns successfully, but the local test case still doesn’t work.

Finally we will test with specifying SSL v3 specifically.

test_url_ssl3(URI("https://www.google.com"))
# #<Net::HTTPOK 200 OK readbody=true>
# => #<Net::HTTPOK 200 OK readbody=true>
test_url_ssl3(URI("https://localhost/index.html"))
# #<Net::HTTPOK 200 OK readbody=true>
# => #<Net::HTTPOK 200 OK readbody=true>

And for this, Google and the local test both work. So we have shown that with the right ssl version specified, we do get a response back from our local test server, but the fallback from TLS to SSL doesn’t happen.

–Proctor