Category Archives: Erlang

Ibid.

During my interview with Gene Kim on for Functional Geekery, Episode 128, Gene talked about how he had a problem he was asking different people for how they would solve it nicely with a functional approach, to see how to improve his Clojure solution to be more idiomatic.

His problem was on “rewriting” Ibid. entries in citation references, to get the authors names instead of the Ibid. value, as Ibid. is a shorthand that stands for “the authors listed in the entry before this”.

As he was describing this problem, I was picturing the general pseudo-code with a pattern match in my head. To be fair, this has come from a number of years of getting used to thinking in a functional style as well as thinking in a pattern matching style.

The following Erlang code is a close representation to the pseudo-code that was in my head.

-module(ibid).

-export([ibid/1]).

ibid(Authors) ->
    ibid(Authors, []).

ibid([], UpdatedAuthors) ->
    {ok, lists:reverse(UpdatedAuthors)};
ibid(["Ibid." | _], []) ->
    {error, "No Previous Author for 'Ibid.' citation"};
ibid(["Ibid." | T], UpdatedAuthors=[H | _]) ->
    ibid(T, [H | UpdatedAuthors]);
ibid([H | T], UpdatedAuthors) ->
    ibid(T, [H | UpdatedAuthors]).

Running this in the Erlang shell using erl results in the following

> ibid:ibid(["Mike Nygard", "Gene Kim", "Ibid.", "Ibid.", "Nicole Forsgren", "Ibid.", "Jez Humble", "Gene Kim", "Ibid."]).
{ok,["Mike Nygard","Gene Kim","Gene Kim","Gene Kim",
     "Nicole Forsgren","Nicole Forsgren","Jez Humble","Gene Kim",
     "Gene Kim"]}
> ibid:ibid(["Ibid."]).
{error,"No Previous Author for 'Ibid.' citation"}

Throughout the editing of the podcast, I continued to think about his problem, and how I would approach it in Clojure without built-in pattern matching, and came up with the following using a cond instead of a pure pattern matching solution:

(defn
  update_ibids
  ([authors] (update_ibids authors []))
  ([[citation_author & rest_authors :as original_authors] [last_author & _ :as new_authors]]
    (let [ibid? (fn [author] (= "Ibid." author))]
      (cond
        (empty? original_authors) (reverse new_authors)
        (and (ibid? citation_author) (not last_author))
          (throw (Exception. "Found `Ibid.` with no previous author"))
        :else (recur
          rest_authors
          (cons
            (if (ibid? citation_author)
                last_author
                citation_author)
            new_authors))))))

And if we run this in the Clojure REPL we get the following:

user=> (def references ["Gene Kim", "Jez Humble", "Ibid.", "Gene Kim", "Ibid.", "Ibid.", "Nicole Forsgren", "Micheal Nygard", "Ibid."])

user=> (update_ibids [])
()
user=> (update_ibids ["Ibid."])
Execution error at user/update-ibids (REPL:8).
Found `Ibid.` with no previous author
user=> (update_ibids references)
("Gene Kim" "Jez Humble" "Jez Humble" "Gene Kim" "Gene Kim" "Gene Kim" "Nicole Forsgren" "Micheal Nygard" "Micheal Nygard")

That solution didn’t sit well with me (and if there is a more idiomatic way to write it I would love some of your solutions as well), and because of that, I wanted to see what could be done using the core.match library, which moves towards the psuedo-code I was picturing.

(ns ibid
  (:require [clojure.core.match :refer [match]]))


(defn
  update_ibids
  ([authors] (update_ibids authors []))
  ([orig updated]
    (match [orig updated]
      [[] new_authors] (reverse new_authors)
      [["Ibid." & _] []] (throw (Exception. "Found `Ibid.` with no previous author"))
      [["Ibid." & r] ([last_author & _] :seq) :as new_authors] (recur r (cons last_author new_authors))
      [[author & r] new_authors] (recur r (cons author new_authors)) )))

And if you are trying this yourself, don’t forget to add to your deps.edn file:

{:deps
  {org.clojure/core.match {:mvn/version "0.3.0"}}

After the first couple of itches were scratched, Gene shared on Twitter Stephen Mcgill’s solution and his solution inspired by Stephen’s.

https://twitter.com/RealGeneKim/status/1201922587346866176

(Edit 2022-05-02 : I took out the Twitter embed and changed the embed to be an HTML link to Twitter if you are interested in seeing the post as it was pointed out that tracking cookies were being dropped by Twitter, in an effort to reduce cookies being dropped by this site.)

And then, just for fun (or “just for defun” if you prefer the pun intended version), I did a version in LFE (Lisp Flavored Erlang) due to it being a Lisp with built in pattern matching from being on the Erlang runtime.

(defmodule ibid
  (export (ibid 1)))


(defun ibid [authors]
  (ibid authors '[]))


(defun ibid
  ([[] updated]
    (tuple 'ok (: lists reverse updated)))
  (((cons "Ibid." _) '[])
    (tuple 'error "No Previous Author for 'Ibid.' citation"))
  ([(cons "Ibid." authors) (= (cons h _) updated)]
    (ibid authors (cons h updated)))
  ([(cons h rest) updated]
    (ibid rest (cons h updated))))

Which if we call it in LFE’s REPL gives us the following:

lfe> (: ibid ibid '["Mike Nygard" "Gene Kim" "Ibid." "Ibid." "Nicole Forsgren" "Ibid." "Jez Humble" "Gene Kim" "Ibid."])
#(ok
  ("Mike Nygard"
   "Gene Kim"
   "Gene Kim"
   "Gene Kim"
   "Nicole Forsgren"
   "Nicole Forsgren"
   "Jez Humble"
   "Gene Kim"
   "Gene Kim"))
lfe> (: ibid ibid '["Ibid."])
#(error "No Previous Author for 'Ibid.' citation")

If you have different solutions shoot them my way as I would love to see them, and if there looks to be interest, and some responses, I can create a catalog of different solutions similar to what Eric Normand does on his weekly challenges with his PurelyFunctional.tv Newsletter.

Erlang Thursday – ETS selects, continuations, and concurrent inserts

At the end of last week’s Erlang Thursday, I said we would continue looking at the behavior of the select functions in the ets module.

So before we do any experimentation, we setup our test ETS tables, and this time we will also create a table of type ordered_set.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.52.0>
TestOrderedSetTable = ets:new(ordered_set_table, [public, ordered_set]).
% 16402
TestTable = ets:new(ets_table, [public]).
% 20499
ets:give_away(TestTable, SomeProcess, []).
% true
ets:give_away(TestOrderedSetTable, SomeProcess, []).
% true

Next we will load our test ETS table with some dummy data, leaving some gaps in the sequence, allowing us to fill those gaps in later.

[[ets:insert(TestTable, {X, X}) || X <- lists:seq(1, 30, 2)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true]]
[[ets:insert(TestOrderedSetTable, {X, X}) || X <- lists:seq(1, 30, 2)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true]]

We then do a select to get all of the records from the table so we can see how the results are ordered for the different table types.

ets:select(TestTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}]).
% [{15,15},
%  {25,25},
%  {13,13},
%  {21,21},
%  {11,11},
%  {1,1},
%  {23,23},
%  {7,7},
%  {3,3},
%  {9,9},
%  {19,19},
%  {29,29},
%  {27,27},
%  {17,17},
%  {5,5}]
ets:select(TestOrderedSetTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}]).
% [{1,1},
%  {3,3},
%  {5,5},
%  {7,7},
%  {9,9},
%  {11,11},
%  {13,13},
%  {15,15},
%  {17,17},
%  {19,19},
%  {21,21},
%  {23,23},
%  {25,25},
%  {27,27},
%  {29,29}]

The ets module also has a function ets:select_reverse, so let’s take a quick stop and see what that does for our ETS tables.

ets:select_reverse(TestTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}]).
% [{15,15},
%  {25,25},
%  {13,13},
%  {21,21},
%  {11,11},
%  {1,1},
%  {23,23},
%  {7,7},
%  {3,3},
%  {9,9},
%  {19,19},
%  {29,29},
%  {27,27},
%  {17,17},
%  {5,5}]
ets:select_reverse(TestOrderedSetTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}]).
% [{29,29},
%  {27,27},
%  {25,25},
%  {23,23},
%  {21,21},
%  {19,19},
%  {17,17},
%  {15,15},
%  {13,13},
%  {11,11},
%  {9,9},
%  {7,7},
%  {5,5},
%  {3,3},
%  {1,1}]

If we look at the results of ets:select/2 and ets:select_reverse/2, we see that for TestTable we get the same result, and for TestOrderedSetTable we get the results in a reverse order, which is what the documentation for ets:select_reverse/2 states. Which makes sense if you think about it,

With that brief diversion out of the way, lets run our same match_spec()s from above, but limit the results to 5 records so we get a continuation back.

{Result, Continuation} = ets:select(TestTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}], 5).
% {[{19,19},{29,29},{27,27},{17,17},{5,5}],
% {20499,214,5,<<>>,[],0}}
{OrdSetResult, OrdSetContinuation} = ets:select(TestOrderedSetTable, [{{'$1', '$2'}, [], [{{'$1', '$2'}}]}], 5).
% {[{1,1},{3,3},{5,5},{7,7},{9,9}],{16402,9,[],5,<<>>,[],0,0}}

And with those continuations, we will see what the next results we would fetch would be.

ets:select(Continuation).
% {[{1,1},{23,23},{7,7},{3,3},{9,9}],{20499,111,5,<<>>,[],0}}
ets:select(OrdSetContinuation).
% {[{11,11},{13,13},{15,15},{17,17},{19,19}],
%  {16402,19,[],5,<<>>,[],0,0}}

Remember those “gaps” we left in our sequence of numbers we used to create tuples?

Time to “fill in” those gaps of the sequence to see what happens if we fetch with our existing continuation as data gets populated concurrently.

[[ets:insert(TestOrderedSetTable, {X, X}) || X <- lists:seq(2, 30, 2)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true]]
[[ets:insert(TestTable, {X, X}) || X <- lists:seq(2, 30, 2)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true]]

Now we re-run our ets:select/1 functions with the same continuations as before.

ets:select(Continuation).
% {[{12,12},{7,7},{3,3},{10,10},{9,9}],
%  {20499,224,5,<<>>,[],0}}
ets:select(OrdSetContinuation).
% {[{10,10},{11,11},{12,12},{13,13},{14,14}],
%  {16402,14,[],5,<<>>,[],0,0}}

If we compare that to before we can see the we now have even number items in the list. For our TestTable if we look above at the Continuation value itself, we ahve the continuation point as 214, since that is the only thing that has changed between that continuation and the resulting continuations from calling ets:select(Continuation).. So with just a number it is hard to infer just how we might expect the continuation to change.

The OrdSetContinuation on the other hand, has a 9 as its second element in the tuple, after the ETS table id of 16402. This also happens to be the key of the last tuple in the result set, which matches up with the 19 and 14 in the other continuations. So in the case of the ordered set, we can infer that as part of the continuation for an ETS table of type ordered_set, the continuation tells us the specific key of the last record that was returned, and we continue from that record regardless of any concurrent inserts that may have taken place.

Next time we will take a look at ets:is_compiled_ms/1 and how match specs might play in with continuations based off reading the documentation about ets:is_compiled_ms/1.

–Proctor

Erlang Thursday – Using ETS select with a limit

In last week’s Erlang Thursday we continued to explore ets:select/2 and seeing its use when combined with using ets:fun2ms to generate the match_spec()s.

This week we will take a look at the other versions of select that the ets module provides.

Yet again we will do our new playground ETS table setup, so if we crash our shell session we don’t loose the table.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.52.0>
TestTable = ets:new(ets_table, [public]).
% 16402
ets:give_away(TestTable, SomeProcess, []).
% true

Next we will load our test ETS table with a bunch of test “products”. For ease of example, we will just use a number for the product id, and a random price ending in .99.

[[ets:insert(TestTable, {ProductId, random:uniform(100) + 0.99})
  || ProductId <- lists:seq(1, 10000) ]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true|...]]

We will create a match_spec() to find items in their twenties (and we will go ahead and round 19.99 up to 20 just because).

ProductsInTheTwenties = ets:fun2ms(fun({Product, Price})
                                     when Price >= 19.99 andalso Price < 30
                                     -> {Product, Price}
                                   end).
% [{{'$1','$2'},
%   [{'andalso',{'>=','$2',19.99},{'<','$2',30}}],
%   [{{'$1','$2'}}]}]

And if we use ets:select/2 against our table with this match spec, we get all of the results back in one query as we saw previously.

ets:select(TestTable, ProductsInTheTwenties).
% [{4351,29.99},
%  {635,19.99},
%  {6005,20.99},
%  {3742,27.99},
%  {5956,29.99},
%  {3753,28.99},
%  {6653,25.99},
%  {5151,28.99},
%  {2693,27.99},
%  {4253,21.99},
%  {7636,23.99},
%  {1935,19.99},
%  {9044,22.99},
%  {7797,22.99},
%  {2147,23.99},
%  {2574,26.99},
%  {7575,29.99},
%  {2130,28.99},
%  {4908,27.99},
%  {2218,22.99},
%  {9848,21.99},
%  {7632,26.99},
%  {3562,21.99},
%  {3130,27.99},
%  {575,26.99},
%  {4622,28.99},
%  {5678,25.99},
%  {4022,...},
%  {...}|...]

But the ets module also gives us a way to limit the results if we would like, using ets:select/3 and giving a limit of the number of results to return at a time.

So let’s use ets:select/3 and give a limit of 10 and see what happens.

ets:select(TestTable, ProductsInTheTwenties, 10).
% {[{9027,27.99},
%   {7347,29.99},
%   {7282,20.99},
%   {9386,24.99},
%   {5415,25.99},
%   {4032,29.99},
%   {8105,25.99},
%   {4634,24.99},
%   {1275,20.99},
%   {234,20.99}],
%  {16402,576,10,<<>>,[],0}}

We get a tuple back instead of a list of results. The first item in the tuple is a list of our first ten results we specified, the second is some bizarre looking tuple, which if we look at the documentation for ets:select/3 represents something referred to as a continuation.

So we run our query again, and bind the results this time.

{Results, Continuation} = ets:select(TestTable, ProductsInTheTwenties, 10).
% {[{9027,27.99},
%   {7347,29.99},
%   {7282,20.99},
%   {9386,24.99},
%   {5415,25.99},
%   {4032,29.99},
%   {8105,25.99},
%   {4634,24.99},
%   {1275,20.99},
%   {234,20.99}],
%  {16402,576,10,<<>>,[],0}}

So we have this continuation, but what is it and what does it mean for us to have it.

In short, it can be thought of as an immutable bookmark. It represents not only what page we are in for our query results, but also the book we are reading (our query).

This allows us to quickly pick up where we previously left off in our results set by passing the continuation to ets:select/1.

ets:select(Continuation).
% {[{2533,24.99},
%   {1357,22.99},
%   {564,21.99},
%   {9086,22.99},
%   {5265,25.99},
%   {4030,22.99},
%   {2802,25.99},
%   {8254,27.99},
%   {7088,26.99},
%   {3062,27.99}],
%  {16402,960,10,<<>>,[{6792,29.99},{9295,29.99}],2}}

And because it is our special immutable bookmark, every time we use that bookmark it takes us to the same starting point in the same book, and we only read the same number of maximum pages as originally set as our limit.

So no matter how many times we call ets:select/1 with our same continuation, we will get the same results each time.

ets:select(Continuation).
% {[{2533,24.99},
%   {1357,22.99},
%   {564,21.99},
%   {9086,22.99},
%   {5265,25.99},
%   {4030,22.99},
%   {2802,25.99},
%   {8254,27.99},
%   {7088,26.99},
%   {3062,27.99}],
%  {16402,960,10,<<>>,[{6792,29.99},{9295,29.99}],2}}
ets:select(Continuation).
% {[{2533,24.99},
%   {1357,22.99},
%   {564,21.99},
%   {9086,22.99},
%   {5265,25.99},
%   {4030,22.99},
%   {2802,25.99},
%   {8254,27.99},
%   {7088,26.99},
%   {3062,27.99}],
%  {16402,960,10,<<>>,[{6792,29.99},{9295,29.99}],2}}
ets:select(Continuation).
% {[{2533,24.99},
%   {1357,22.99},
%   {564,21.99},
%   {9086,22.99},
%   {5265,25.99},
%   {4030,22.99},
%   {2802,25.99},
%   {8254,27.99},
%   {7088,26.99},
%   {3062,27.99}],
%  {16402,960,10,<<>>,[{6792,29.99},{9295,29.99}],2}}

And if we look at the resulting tuple, we see that we get a different tuple for our next continuation.

{SecondResults, SecondContinuation} = ets:select(Continuation).
% {[{2533,24.99},
%   {1357,22.99},
%   {564,21.99},
%   {9086,22.99},
%   {5265,25.99},
%   {4030,22.99},
%   {2802,25.99},
%   {8254,27.99},
%   {7088,26.99},
%   {3062,27.99}],
%  {16402,960,10,<<>>,[{6792,29.99},{9295,29.99}],2}}

And we can pick up that new continuation, and use that in our next call to ets:select/1 to get the next set of results, with another continuation.

ets:select(SecondContinuation).
% {[{8569,19.99},
%   {1805,28.99},
%   {6819,23.99},
%   {9313,28.99},
%   {9527,27.99},
%   {1737,29.99},
%   {700,26.99},
%   {142,25.99},
%   {6792,29.99},
%   {9295,29.99}],
%  {16402,513,10,<<>>,[],0}}

And if we have a query for which we have exhausted our results set, we get an '$end_of_table' atom.

ets:select(TestTable, [{{'$1', '$2'}, [{'<', '$2', 0}], ['$$']}], 10).
% '$end_of_table'

The ability to specify a limit and have a continuation is also available via on match with ets:match/3 and ets:match/1, and match_object via ets:match_object/3 and ets:match_object/1.

Next week, we will continue looking at the various select functions in ets as we look into their behavior with and ordered set, will look at select vs select_reverse, and play with and see how continuations work if we get some new entries inserted in the results when using a continuation.
–Proctor

Erlang Thursday – ETS, match_specs, and functions

In last week’s Erlang Thursday I concluded with showing how we can take advantage of using ets:select but take advantage of making our queries more expressive.

First we will need a new ETS table, so we start with a new public “Products” table, and do our standard of creating a new process and giving ownership of the table away.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.52.0>
Products = ets:new(products, [public]).
% 16402
ets:give_away(Products, SomeProcess, []).
% true

Next we will load our “Products” into the table.

In our case, we are just creating a “product” with the “name” as a binary and a “price in CWC” (Common World Currency) as an integer.

[[ ets:insert(Products, {integer_to_binary(X), X}) || X <- lists:seq(1, 100) ]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true|...]]

As we saw last week, we can manually build up a list of tuples into the match_spec() to run our query, say for items less than 10 CWCs.

ets:select(Products, [{{'$1', '$2'}, [{'<', '$2', 10}], ['$1']}]).
% [<<"8">>,<<"6">>,<<"5">>,<<"3">>,<<"7">>,<<"1">>,<<"4">>,
%  <<"9">>,<<"2">>]

We can also find those item names that are more than 10 CWCs and less than 25 CWCS.

ets:select(Products, [{{'$1', '$2'}, [{'>', '$2', 10}, {'<', '$2', 25}], ['$1']}]).
% [<<"11">>,<<"15">>,<<"23">>,<<"20">>,<<"21">>,<<"14">>,
%  <<"12">>,<<"13">>,<<"16">>,<<"19">>,<<"17">>,<<"18">>,
%  <<"22">>,<<"24">>]

But this isn’t necessarily clear, as we are using numerical values for the items in the tuple, and lists of tuples with lists of tuples inside them.

Enter ets:fun2ms/1 to the rescue.

ets:fun2ms/1 will take a function and will turn that function into a match_spec().

This allows us to write a function that takes a tuple of Product and Cost, and will return the Product if the Cost is less than 10.

ets:fun2ms(fun({Product, Cost}) when Cost < 10 -> Product end).
% [{{'$1','$2'},[{'<','$2',10}],['$1']}]

We can also have compound checks in our guard clauses on the functions we pass to ets:fun2ms/1, such as and clauses,

Between_25_And_35_CWC = ets:fun2ms(fun({Product, Cost}) when Cost > 25, Cost < 35 -> Product end).
% [{{'$1','$2'},[{'>','$2',25},{'<','$2',35}],['$1']}]
ets:select(Products, Between_25_And_35_CWC).
% [<<"30">>,<<"33">>,<<"32">>,<<"29">>,<<"28">>,<<"26">>,
%  <<"34">>,<<"27">>,<<"31">>]

as well as or style clauses.

While this is useful it does have its limitations, as this parse transform on the function, so you can’t use everything that you would be able to with a normal function.

ets:fun2ms(fun({Product, Cost}) when Cost > 90 -> lists:reverse(binary:bin_to_list(Product)) end).
# Error: Unknown error code {122,lists,reverse}
# {error,transform_error}

But then again, the results part of the match_spec(), doesn’t support advanced functionality anyways.

ets:select(Products, [{{'$1', '$2'}, [{'<', 90, '$2'}], [binary:bin_to_list('$1')]}]).
# ** exception error: bad argument
#      in function  binary:bin_to_list/1
#         called as binary:bin_to_list('$1')

But even with its limitations, ets:fun2ms/1 still does a good job to help make our ETS queries more expressive. Not only can we reference a function with expressive variable names over just $X, as well as give guard clauses instead of just guard tuples, but we can also use those variable names in our results as well, and do the basic formatting as part of the function.

And make sure to check back in next week as we continue with looking at the different versions of ets:select.

–Proctor

Erlang Thursday – More ETS data matching (and querying)

In today’s Erlang Thursday we continue from last week in looking at getting data from ETS.

To refresh, we have a module markov_words, and for this week we have added a new function markov_words:create_word_triples/1.

-module(markov_words).

-export([create_word_pairs/1,
         create_word_triples/1]).

-spec create_word_pairs(string()) -> list({string(), string()}).
create_word_pairs(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_pairs([], Words).

-spec create_word_triples(string()) -> list({string(), string(), string()}).
create_word_triples(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_triples(Words, []).


create_word_pairs(WordPairs, [_Word|[]]) ->
    WordPairs;
create_word_pairs(WordPairs, [Word|Words]) ->
    [Following|_] = Words,
    UpdatedWordPairs = [{Word, Following} | WordPairs],
    create_word_pairs(UpdatedWordPairs, Words).


create_word_triples([_Word, _SecondWord | []], WordTriples) ->
    WordTriples;
create_word_triples([FirstWord | Words], WordTriples) ->
    [SecondWord, Following | _] = Words,
    UpdatedWordTriples = [{FirstWord, SecondWord, Following} | WordTriples],
    create_word_triples(Words, UpdatedWordTriples).

The excuse for having this new function is that it would allow us to get more refined Markov chains by picking the probability of the next word by having the state be the compound key of the last two words seen.

With our module updated and defined, we get back to our Erlang shell to test things out, by compiling our module and loading up our intro text into a variable.

c(markov_words).
% {ok,markov_words}

ToTC = "It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of Light, it was the season of Darkness,
it was the spring of hope, it was the winter of despair,
we had everything before us, we had nothing before us,
we were all going direct to Heaven,
we were all going direct the other way--in short,
the period was so far like the present period,
that some of its noisiest authorities insisted on its
being received, for good or for evil, in the superlative
degree of comparison only.

There were a king with a large jaw and a queen with a
plain face, on the throne of England; there were a king
with a large jaw and a queen with a fair face,
on the throne of France. In both countries it was
clearer than crystal to the lords of the State preserves
of loaves and fishes, that things in general were
settled for ever.".

We create our fresh ETS table for this week, create a new process to own it, and give it away (in case we type something wrong and cause the current session of the shell to crash).

MarkovWords = ets:new(markov_word_tuples, [public, duplicate_bag]).
% 16402
Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.58.0>
ets:give_away(MarkovWords, SomeProcess, []).
% true

This week, in addition to adding our word pair tuples to ETS, we will also add in our new word triple tuples to ETS in the same table.

[[ ets:insert(MarkovWords, WordPair) || WordPair <- markov_words:create_word_pairs(ToTC)]].

[[ ets:insert(MarkovWords, WordTriple) || WordTriple <- markov_words:create_word_triples(ToTC)]].

Since we have both word pairs and word triples in the same ETS table, we can see that with ets:match_object/2, we can specify a match_pattern() for only the two tuples

ets:match_object(MarkovWords, {"of", '$1'}).
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"}]

or a match_pattern() that will only match the three tuples.

ets:match_object(MarkovWords, {"of", '$1', '$2'}).
% [{"of","loaves","and"},
%  {"of","the","State"},
%  {"of","France.","In"},
%  {"of","England;","there"},
%  {"of","comparison","only."},
%  {"of","its","noisiest"},
%  {"of","despair,","we"},
%  {"of","hope,","it"},
%  {"of","Darkness,","it"},
%  {"of","Light,","it"},
%  {"of","incredulity,","it"},
%  {"of","belief,","it"},
%  {"of","foolishness,","it"},
%  {"of","wisdom,","it"},
%  {"of","times,","it"},
%  {"of","times,","it"}]

Where as if we use the ets:lookup/2 with the key, we get all items with the key, regardless of the tuple size.

ets:lookup(MarkovWords, "of").
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"},
%  {"of","loaves","and"},
%  {"of","the","State"},
%  {"of","France.","In"},
%  {"of","England;","there"},
%  {"of","comparison","only."},
%  {"of","its","noisiest"},
%  {"of","despair,","we"},
%  {"of","hope,","it"},
%  {"of","Darkness,","it"},
%  {"of","Light,",[...]},
%  {"of",[...],...},
%  {[...],...},
%  {...}|...]

And unlike ets:lookup/2, with ets:match_object/2 we can match on any tuple element, and not just the key.

ets:match_object(MarkovWords, {'$1', "the", '$2'}).
% [{"on","the","throne"},
%  {"on","the","throne"},
%  {"direct","the","other"},
%  {"short,","the","period"},
%  {"like","the","present"},
%  {"of","the","State"},
%  {"to","the","lords"},
%  {"in","the","superlative"},
%  {"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the","season"},
%  {"was","the","season"},
%  {"was","the","epoch"},
%  {"was","the","epoch"},
%  {"was","the","age"},
%  {"was","the","age"},
%  {"was","the","worst"},
%  {"was","the","best"}]

And like ets:match_object/2, ets:match/2 can match based off the tuple itself as well.

ets:match(MarkovWords, {"was", "the", '$1'}).
% [["winter"],
%  ["spring"],
%  ["season"],
%  ["season"],
%  ["epoch"],
%  ["epoch"],
%  ["age"],
%  ["age"],
%  ["worst"],
%  ["best"]]

But sometimes we might want finer grain control over how the results are given back to us, such as a single list of items instead of a nested list of strings. Or maybe we even have some criteria that we want to hold true as part of our selections on the data.

Enter ets:select/2.

ets:select/2 takes the table as its first argument, and a match_spec() as its second argument.

The match_spec() is a list of three-tuples, where the first element is the match pattern, second element is a list of guard clause tuples, and the last element is the result is a term representation of the result for each match.

If we want to call ets:select/2 and have it align with ets:match/2 our call looks like the following.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], [['$1']]}]).
% [["winter"],
%  ["spring"],
%  ["season"],
%  ["season"],
%  ["epoch"],
%  ["epoch"],
%  ["age"],
%  ["age"],
%  ["worst"],
%  ["best"]]

The second argument is a list of match_spec()s, of which there is only one which consists of:
1). a match_pattern() of {"was", "the", '$1'}, which is the same thing we gave to ets:match/2
2). [], and empty list of guard condition tuples, and
3). [[‘$1’]] for the result term, which is the list of terms we want the result formatted as, in this case we want each result to be in its own list.

If we just wanted to get the word themselves as a list, we can update the result term part of the match_spec() to be ['$1'] instead.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], ['$1']}]).
% ["winter","spring","season","season","epoch","epoch","age",
%  "age","worst","best"]

If we wanted something that looked more like a ets:match_object/2 result set we can use the result term of '$_', which signifies the whole object.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], ['$_']}]).
% [{"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the","season"},
%  {"was","the","season"},
%  {"was","the","epoch"},
%  {"was","the","epoch"},
%  {"was","the","age"},
%  {"was","the","age"},
%  {"was","the","worst"},
%  {"was","the","best"}]

And if we wanted to only match on one of the items, and capture the other items in the tuple, we can use the result of '$$' which returns all of the match variable in a list, ordered by variable number as opposed to position in the match_pattern().

ets:select(MarkovWords, [{{"was", '$1', '$2'}, [], ['$$']}]).
% [["clearer","than"],
%  ["so","far"],
%  ["the","winter"],
%  ["the","spring"],
%  ["the","season"],
%  ["the","season"],
%  ["the","epoch"],
%  ["the","epoch"],
%  ["the","age"],
%  ["the","age"],
%  ["the","worst"],
%  ["the","best"]]

ets:select(MarkovWords, [{{"was", '$2', '$1'}, [], ['$$']}]).
% [["than","clearer"],
%  ["far","so"],
%  ["winter","the"],
%  ["spring","the"],
%  ["season","the"],
%  ["season","the"],
%  ["epoch","the"],
%  ["epoch","the"],
%  ["age","the"],
%  ["age","the"],
%  ["worst","the"],
%  ["best","the"]]

With ets:select/2 we also get the ability to specify multiple match_spec()s. This allows us to find all word triple word triples that have either "of" or "the" as the middle word.

ets:select(MarkovWords, [{{'$1', "the", '$2'}, [], ['$_']}, {{'$1', "of", '$2'}, [], ['$_']}]).
% [{"some","of","its"},
%  {"on","the","throne"},
%  {"on","the","throne"},
%  {"direct","the","other"},
%  {"preserves","of","loaves"},
%  {"throne","of","France."},
%  {"throne","of","England;"},
%  {"worst","of","times,"},
%  {"short,","the","period"},
%  {"winter","of","despair,"},
%  {"degree","of","comparison"},
%  {"epoch","of","incredulity,"},
%  {"epoch","of","belief,"},
%  {"spring","of","hope,"},
%  {"like","the","present"},
%  {"of","the","State"},
%  {"age","of","foolishness,"},
%  {"age","of","wisdom,"},
%  {"best","of","times,"},
%  {"season","of","Darkness,"},
%  {"season","of","Light,"},
%  {"to","the","lords"},
%  {"in","the","superlative"},
%  {"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the",[...]},
%  {"was",[...],...},
%  {[...],...},
%  {...}|...]

And with guard clauses, we can find third item in the three-tuples that start with "was", that comes later in the dictionary than the word in the second position of the tuple.

ets:select(MarkovWords, [{{"was", '$1', '$2'}, [{'<', '$1', '$2'}], ['$2']}]).
% ["than","winter","worst"]

So with this week’s post we have seen other ways of using ets:match/2 and ets:match_object/2, and what they can get over using just a ets:lookup/2 for a key, as well as being able to take advantage of even more powerful querying by using ets:select/2.

Next week, we will look at more ways to use ets:select/2, and how we can use some other ets module functions to help create queries that can be easier to deconstruct at a quicker glance.

–Proctor

Erlang Thursday – ETS data matching

Today’s Erlang Thursday moves on from the introduction to ETS, and starts using it to store some data and do some retrieval of data in ETS.

First we need some data to have in ETS, so we will fall back to one of my goto problems, Markov Chains.

For those unfamiliar with what a Markov Chain is, it is a state machine that transitions to the next state based off a probability instead of a specific input. The common example that people are familiar with in “everyday use” is predictive typing on smart phones, where the next word or letter is offered up as a prediction, and the predicted words are chosen by the historical likelihood that the words predicted follows the current word.

The first thing we will do is to create a module that given a string of text, will return a list of tuples representing a word and the word that follows.

-module(markov_words).

-export([create_word_pairs/1]).

-spec create_word_pairs(string()) -> list({string(), string()}).
create_word_pairs(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_pairs([], Words).

create_word_pairs(WordPairs, [_Word|[]]) ->
    WordPairs;
create_word_pairs(WordPairs, [Word|Words]) ->
    [Following|_] = Words,
    UpdatedWordPairs = [{Word, Following} | WordPairs],
    create_word_pairs(UpdatedWordPairs, Words).

The above code takes a string of text and splits that text into “words” based off using the space, tab, and newline characters as a word boundary. With that list of “words”, we then create a list of word to following word tuples, which is what we will be inserting into our ETS table.

Time to fire up the Erlang shell and start experimenting.

First we need to compile our module, and then we will create a variable to hold our text we want to use to prime our Markov Chain.

c(markov_words).
% {ok,markov_words}
ToTC = "It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of Light, it was the season of Darkness,
it was the spring of hope, it was the winter of despair,
we had everything before us, we had nothing before us,
we were all going direct to Heaven,
we were all going direct the other way--in short,
the period was so far like the present period,
that some of its noisiest authorities insisted on its
being received, for good or for evil, in the superlative
degree of comparison only.

There were a king with a large jaw and a queen with a
plain face, on the throne of England; there were a king
with a large jaw and a queen with a fair face,
on the throne of France. In both countries it was
clearer than crystal to the lords of the State preserves
of loaves and fishes, that things in general were
settled for ever.".

We will create a new process to give our ETS table away to, just in case we bomb out the shell.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.60.0>

And we now create an ETS table that will store data for us to use as part of our Markov Chain generation.

WordPairs = ets:new(word_pairs, [public, duplicate_bag]).
% 20498
ets:give_away(WordPairs, SomeProcess, []).
% true

We make the table public in this case, since we want our shell process, which is no longer the owner, to be able to add items to the table, and we make the table type a duplicate bag.

The reason for the duplicate_bag, is that for demonstration reasons, we want to be able to have multiple entries with the same key, as we are likely to see any word multiple times, and that some sets of word pairs are more common, so we want to be able to capture (and retain) those “duplicates”.

And for ease of population from inside the shell, we will use a list comprehension to add each word pair tuple we create from the text into our ETS table by calling ets:insert/2.

[[ ets:insert(WordPairs, WordPair) || WordPair <- markov_words:create_word_pairs(ToTC)]].
% [[true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true,true,true,true,true,true,true,true,
%   true,true,true,true,true|...]]

Now we should have some data in our ETS table, it is time to see how we can access our data. For this we turn to ets:match/2. ets:match/2 takes a table to query, and a Pattern.

The pattern is made up an Erlang term to be matched against; _, which matches anything and doesn’t bind; or pattern variables, which take the form of $N where N is any positive integer. The return result of ets:match/2 is a list containing the list of values in the pattern variables in order of variable name order.

So with this knowledge, we can try to find the word pairs to find the words that follow "of". If we were doing a pattern match it would look like {"of", Following}, but using ETS, we need to use a pattern variable which would make our spec {"of", '$1'}.

So lets run that against our ETS table.

ets:match(WordPairs, {"of", '$1'}).
% [["loaves"],
%  ["the"],
%  ["France."],
%  ["England;"],
%  ["comparison"],
%  ["its"],
%  ["despair,"],
%  ["hope,"],
%  ["Darkness,"],
%  ["Light,"],
%  ["incredulity,"],
%  ["belief,"],
%  ["foolishness,"],
%  ["wisdom,"],
%  ["times,"],
%  ["times,"]]

And there we go, we can see the results is a list of the list of variable matches, in this case, just what '$1' matched to.

For fun and exploration, let’s confirm what we would get if we look for the words that follow "it" in our Tale of Two Cities intro text.

ets:match(WordPairs, {"it", '$1'}).
% [["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"],
%  ["was"]]

Just a bunch of "was" which is right for the first two paragraphs of the book.

And we will double check and look for any words that follow "Scrooge".

ets:match(WordPairs, {"Scrooge", '$1'}).
% []

And if we wanted to get the whole tuple back, we could use ets:match_object/2, which will return the whole object that satisfies the match

ets:match_object(WordPairs, {"of", '$1'}).
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"}]

or, in this case we could use ets:lookup/2 which returns all of the items for which the key matches.

ets:lookup(WordPairs, "of").
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"}]

So to take a brief detour away from the Markov Chain example, why might we want to use either ets:lookup/2 or ets:match_object/2 versus the other? To answer that with an example, let’s add another entry into our WordPairs table, that is a three-tuple.

To start with, we will insert 100_000 items into our ETS tables and see what the resulting memory size becomes. We will insert a new tuple of {X, X}, for all numbers from 1 to 100_000.

ets:insert(WordPairs, {"of", "times,", "it"}).
% true

If we do a ets:lookup/2 we get all items with the specified key.

ets:lookup(WordPairs, "of").
[{"of","loaves"},
 {"of","the"},
 {"of","France."},
 {"of","England;"},
 {"of","comparison"},
 {"of","its"},
 {"of","despair,"},
 {"of","hope,"},
 {"of","Darkness,"},
 {"of","Light,"},
 {"of","incredulity,"},
 {"of","belief,"},
 {"of","foolishness,"},
 {"of","wisdom,"},
 {"of","times,"},
 {"of","times,"},
 {"of","times,","it"}]

But if we use ets:match_object/2, and use a two-tuple because we only want the word pairs, we don’t get the item that is a three-tuple in the results.

ets:match_object(WordPairs, {"of", '_'}).
[{"of","loaves"},
 {"of","the"},
 {"of","France."},
 {"of","England;"},
 {"of","comparison"},
 {"of","its"},
 {"of","despair,"},
 {"of","hope,"},
 {"of","Darkness,"},
 {"of","Light,"},
 {"of","incredulity,"},
 {"of","belief,"},
 {"of","foolishness,"},
 {"of","wisdom,"},
 {"of","times,"},
 {"of","times,"}]

Back to the Markov Chain scenario, we can start to see how we can get some text that follows the Markov Chain rules.

We get the match of potential words to choose from for a given word, and we pick a uniformly random result from the list of following words.

PotentialChoices = ets:match(WordPairs, {"of", '$1'}).
[NextWord] = lists:nth(random:uniform(length(PotentialChoices)), PotentialChoices).

We can write a function that will repeat these steps until we get to our termination case. Some examples of a termination state could be a word that doesn’t have a word that follows it; we get to a certain number of words picked to make up our text; or we get to a certain total length, say to fit in a SMS or Tweet.

With that, we have started to scratch the surface of putting some “real” data into ETS, and doing matching against the data for some given pattern. Next week we will continue looking at this example with other ways to get data out of our ETS tables into something that might be nicer to consume.

–Proctor

Erlang Thursday – ETS Introduction Part 5: keypos, compressed, read_conncurrency, and write_concurrency

Today’s Erlang Thursday continues the introduction to ETS and picks up with the promise from last week, and looks at the keypos ETS table setting, and the Tweaks that can be set.

First, we will take a look at the keypos setting.

The keypos is the 1-based index in the tuple to be stored that will be used as the key for the entry. If you remember from the part 3 of the introduction to ETS about the different table types, they use this index for their key comparison to determine if this is a unique item or not.

If we create a new table without specifying the keypos option, it defaults to 1.

Table = ets:new(some_name, []).
% 20498
ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.50.0>},
%  {heir,none},
%  {name,some_name},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

To show the keypos in action, we will create a couple of items to insert into our ETS table so we can see the keypos in action.

Item1 = {1, a}.
% {1,a}
Item2 = {1.0, "a"}.
% {1.0,"a"}
Item3 = {1, "one"}.
% {1,"one"}
Item4 = {a, "a"}.
% {a,"a"}
Item5 = {"a", a}.
% {"a",a}

In the items above, we have some duplicate entries across both the first item and the second item in the two-tuples.

We will go ahead and insert each one of these items in turn, keeping in mind that this table is a set, so any new insert with the same key, will override the previous value for the same key.

ets:insert(Table, Item1).
% true
ets:tab2list(Table).
% [{1,a}]
ets:insert(Table, Item2).
% true
ets:tab2list(Table).
% [{1,a},{1.0,"a"}]
ets:insert(Table, Item3).
% true
ets:tab2list(Table).
% [{1,"one"},{1.0,"a"}]
ets:insert(Table, Item4).
% true
ets:tab2list(Table).
% [{1,"one"},{a,"a"},{1.0,"a"}]
ets:insert(Table, Item5).
% true
ets:tab2list(Table).
% [{"a",a},{1,"one"},{a,"a"},{1.0,"a"}]

When we added Item3 above, it replaced Item1 in the table, since they both have a 1 for the first element in their two-tuple.

We will now create a new table with a keypos of 2, and see how the exact same steps of inserting is changed with a different keypos value.

KeyPosTwo = ets:new(key_pos_2, [{keypos, 2}]).
% 24595
ets:insert(KeyPosTwo, Item1).
% true
ets:tab2list(KeyPosTwo).
% [{1,a}]
ets:insert(KeyPosTwo, Item2).
% true
ets:tab2list(KeyPosTwo).
% [{1.0,"a"},{1,a}]
ets:insert(KeyPosTwo, Item3).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{1.0,"a"},{1,a}]
ets:insert(KeyPosTwo, Item4).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{a,"a"},{1,a}]
ets:insert(KeyPosTwo, Item5).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{a,"a"},{"a",a}]

In this case, it wasn’t until we added Item4 that we had an override, as both Item2 and Item4 both have an "a" as their second item. Then we we add Item5 it overwrites the Item1, as they both have the atom a as their second element.

And if we set a keypos of some value, say three, and we try to insert a tuple that has fewer items, we will get an exception of type bad argument.

KeyPosThree = ets:new(key_pos_3, [{keypos, 3}]).
% 28692
ets:insert(KeyPosThree, Item1).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(28692,{1,a})

Now it is time to look at the compressed option when creating a table.

When creating a new table, the default setting is for it to be uncompressed, as we can see in the table info since it shows {compressed,false}.

UncompressedTable = ets:new(uc, []).
% 32786
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,uc},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

We create a new table, with the compressed option, and when we look at ets:info/1 for the table, we see that it show {compressed,true}.

CompressedTable = ets:new(uc, [compressed]).
% 45074
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,uc},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

compressed, according to the documentation at least, says that it stores the data in a “more compact format to consume less memory”. It also warns that this can this can make operations that need to check the entire tuple slower, and that the key is not stored compressed, at least in the current implementation.

So let’s see what kind of memory difference compressed makes.

To start with, we will insert 100_000 items into our ETS tables and see what the resulting memory size becomes. We will insert a new tuple of {X, X}, for all numbers from 1 to 100_000.

lists:foreach(fun(X) -> ets:insert(CompressedTable, {X, X}) end,
              lists:seq(1, 100000)).
% ok
lists:foreach(fun(X) -> ets:insert(UncompressedTable, {X, X}) end,
              lists:seq(1, 100000)).
% ok
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,714643},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,814643},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Interesting.

For the compressed table the memory is reported to be 814643, but the uncompressed shows the memory to be less than that with 714643.

Maybe it doesn’t like to compact integer values very much, so let’s do the same thing, but use a string for the second item in the tuple.

lists:foreach(fun(X) -> ets:insert(UncompressedTable, {X, integer_to_list(X)}) end,
              lists:seq(1, 100000)).
% ok
lists:foreach(fun(X) -> ets:insert(CompressedTable, {X, integer_to_list(X)}) end, 
              lists:seq(1, 100000)).
% ok
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,914644},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,1692433},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Now using strings in our tuples instead of just using integers, we can see that the compressed ETS table memory is 914644, where as the uncompressed ETS table’s memory is 1692433.

So in addition to thinking about the way you are going to be matching on the data when trying to determine if the table should be compressed, it looks like you also need to think about the type of data you are going to be putting into the ETS table.

The last two options to be discussed are read_concurrency and write_concurrency.

read_conccurency is by default set to false, and, according to the documentation is best for when “read operations are much more frequent than write operations, or when concurrent reads and writes comes in large read and write bursts”.

So if you have a table that has a bunch of reads with the writes infrequently interspersed between the reads, this would be when you would want to enable read_concurrency, as the documentation states that switching between reads and writes is more expensive.

The write_concurrency option is set to false by default, causing any additional concurrent writes to block while an write operation is proceeding. When set to true different tuples of the same table can be written to by concurrent processes, and does not affect any table of the type ordered_set.

This should be it as far as the introduction goes. Next week we will start looking at the different operations we can perform using ETS and ETS tables.

–Proctor

Erlang Thursday – ETS Introduction Part 4: ETS Access Protections

Today’s Erlang Thursday continues the introduction to ETS and takes a look at the different access levels that ETS supports.

The different access levels that ETS supports are: public, protected, and private.

Each of these different types can be passed in when creating a new ETS table, but let’s see what type of ETS table we get when we don’t specify an access level.

Table = ets:new(some_name, []).
% 20501
ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,some_name},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

So the default access level is protected when not specified.

So what does it mean for a ETS table to be protected then? The documentation states that protected tables can be written to by only the owning process, but read by other processes.

So let’s see that at work then.

First let’s create a process that we can give ETS tables away to.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.58.0>

We create a new ETS table and specify it is protected, and we also specify that it is a named_table as a bonus.

ProtectedNamedETS = ets:new(protected_named_ets, [protected, named_table]).
% protected_named_ets

The result of that match is protected_named_ets and not a number like the call to ets:new/2 above, so we should be able to use the name of the table to access the table instead of just the identifier.

We will insert an entry into the ETS table, and we will use the name of the ETS table as the ETS table reference since we said the table is a named_table.

ets:insert(protected_named_ets, {foobar, baz}).
% true

ets:insert/2 returned true so we should now have some data in the table. Let’s pull it out using ets:match/2, and let’s match everything while we are at it by using a $1 for the pattern.

ets:match(protected_named_ets, '$1').
% [[{foobar,baz}]]

So as the owner process of the ETS table, since this was the process that created it, we can read an write to the table.

Now time to give our table away.

ets:give_away(protected_named_ets, SomeProcess, []).
% true

Since the documentation says is is available for reads, we will do the same match we just did before giving it away.

ets:match(protected_named_ets, '$1').
% [[{foobar,baz}]]

We get our results back.

What does a write look like then, since it says only the owning process has access to write, and the return value of calling ets:insert/2 is always true.

ets:insert(protected_named_ets, {barbaz, foo}).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(protected_named_ets,{barbaz,foo})

An exception, and it is of type bad argument, which does hold that it doesn’t allow writes from non-owning processes, but doesn’t exactly make it clear that is what is happening.

How about if we see what we get if we try to call ets:insert/2 on a table that doesn’t exist?

ets:insert(no_such_table, {foo, bar}).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(no_such_table,{foo,bar})

Same exception and same format of the error with just the name of the table and the tuple being different.

Thinking about this some, it does make sense that these two difference cases would be the same error. As far as the inserting process knows, there is no such table when trying to do an insert if no table exists, or if it is set to be protected. Either way, the caller passed in a bad ETS table reference for the call to ets:insert/2.

So we have now seen how protected behaves, which is the default access level, so let’s take a look at public next.

PublicNamedETS = ets:new(public_named_ets, [public, named_table]).
% public_named_ets

We will do an insert and a match from our current process, which is the owner.

ets:insert(public_named_ets, {foo, bar}).
% true
ets:match(public_named_ets, '$1').
% [[{foo,bar}]]

All looks good there.

The documentation states that public allows any process to read from and write to the table, so let’s give the public table away to SomeProcess and try to read and write.

ets:give_away(public_named_ets, SomeProcess, []).
% true

Now that we have given it away, time to try to add a new entry to the table, and see if we can read that write back out.

ets:insert(public_named_ets, {bar, baz}).
% true
ets:match(public_named_ets, '$1').
% [[{foo,bar}],[{bar,baz}]]

There we go. We have just inserted new data into that table, and when we do the ets:match/2 on everything, we see the new data in the result.

Now let’s create a private table. The documentation states that for private ETS tables, only the owner is allowed to read or write to the ETS table.

PrivateNamedETS = ets:new(private_named_ets, [private, named_table]).
private_named_ets

Again, while this process still owns the table, we will add an item and do a read from the table.

ets:insert(private_named_ets, {fizz, buzz}).
% true
ets:match(private_named_ets, '$1').
% [[{fizz,buzz}]]

Time to give this table away to SomeProcess again.

ets:give_away(private_named_ets, SomeProcess, []).
% true

Now that the ETS table is owned by a different process, time to try a read.

ets:match(private_named_ets, '$1').
% ** exception error: bad argument
%      in function  ets:match/2
%         called as ets:match(private_named_ets,'$1')

bad argument exception, just like the attempted ets:insert/2 we tried on the protected ETS table above when it was owned by a different process.

And time for a write.

ets:insert(private_named_ets, {buzz, fizz}).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(private_named_ets,{buzz,fizz})

A bad argument exception here as well, which should not be a surprise at this point, as both the protected write, and this private read both raised that same exception.

So in total, for this introduction so far, we have seen the Type, Access, Named Table, Heir, and Owner settings of an ETS table, and how they relate.

Next week, we will conclude the introduction of ETS by going over the Key Position option and the Tweaks that an ETS table can take when being setup.

–Proctor

Erlang Thursday – ETS Introduction Part 3: ETS Table Types

Today’s Erlang Thursday continues the introduction to ETS and takes a look at the different types of storage strategies that ETS supports.

The different types that ETS supports are: set, ordered_set, bag, and duplicate bag.

Each of these different types can be passed in when creating a new ETS table, but let’s see what type of ETS table we get when we don’t specify any of the types.

ETS_Empty = ets:new(ets_empty, []).
% 36886
ets:info(ETS_Empty).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.50.0>},
%  {heir,none},
%  {name,ets_empty},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

If we look above, we can see the type tagged tuple has the type of set.

To see how the different types can work we will create three tuples to add to the ETS tables of the different type to see what they store.

Item1 = {1, a}.
% {1,a}
Item2 = {1.0, "a"}.
% {1.0,"a"}
Item3 = {1, "one"}.
% {1,"one"}

We will have a two tuples with the first element of 1, and one tuple whose first element is 1.0 to see how the different types of ETS tables behave when given the “same” key.

Why have key of both 1 and 1.0? Because depending on the comparison of equality used, they may or may not be seen as the same, and therefore the same key.

1 == 1.0.
% true
1 =:= 1.0.
% false

First we will take a look at an ETS table of type set.

ETS_Set = ets:new(ets_set, [set]).
40978

We insert Item1 followed by an insert of Item2, and use ets:tab2list/1 to see what is stored in the ETS table.

ets:insert(ETS_Set, Item1).
% true
ets:insert(ETS_Set, Item2).
% true
ets:tab2list(ETS_Set).
% [{1,a},{1.0,"a"}]

An ETS table of type set sees 1 and 1.0 as different keys. So now let’s add Item3 and see what happens when we do an insert with an already existing key.

ets:insert(ETS_Set, Item3).
% true
ets:tab2list(ETS_Set).
% [{1,"one"},{1.0,"a"}]

The previous tuple with the key of 1 was replaced by the tuple for Item3 which is the last thing we inserted.

Let’s look at what an ordered_set does.

ETS_OrdSet = ets:new(ets_ordset, [ordered_set]).
% 45075

Again we’ll insert Item1 followed by Item2 and use ets:tab2list/1 to check it’s state.

ets:insert(ETS_OrdSet, Item1).
% true
ets:insert(ETS_OrdSet, Item2).
% true
ets:tab2list(ETS_OrdSet).
% [{1.0,"a"}]

In this case, the key of 1.0 was seen the same as the previous 1 that was in there, so it overwrites the first item inserted.

We insert Item3 to the ordered_set, and we can see it gets replaced yet again.

ets:insert(ETS_OrdSet, Item3).
% true
ets:tab2list(ETS_OrdSet).
% [{1,"one"}]

Now lets check an ETS table that is a bag.

ETS_Bag = ets:new(ets_bag, [bag]).
% 49172

And we yet again add Item1 and Item2 to the table.

ets:insert(ETS_Bag, Item1).
% true
ets:insert(ETS_Bag, Item2).
% true
ets:tab2list(ETS_Bag).
% [{1,a},{1.0,"a"}]

Looking at ets:tab2list/1, we can see that for a bag they are treated as two different items.

And again we will see what happens when we insert Item3 into this ETS table.

ets:insert(ETS_Bag, Item3).
% true
ets:tab2list(ETS_Bag).
% [{1,a},{1,"one"},{1.0,"a"}]

In the case of a bag type of ETS table, we have Item2 along with entries Item1 and Item3 even though Item1 and Item3 both have the same key.

The last type of ETS table we have is a duplicate_bag.

ETS_DupBag = ets:new(ets_dupbag, [duplicate_bag]).
% 53269

We insert Item1 followed by Item2 as we did with all of the other types of ETS tables.

ets:insert(ETS_DupBag, Item1).
% true
ets:insert(ETS_DupBag, Item2).
% true
ets:tab2list(ETS_DupBag).
% [{1,a},{1.0,"a"}]

And like all of the other ETS table types, we insert Item3 into the duplicate_bag ETS table type.

ets:insert(ETS_DupBag, Item3).
% true
ets:tab2list(ETS_DupBag).
% [{1,a},{1,"one"},{1.0,"a"}]

And we see we have all three items in the ETS table for a duplicate_bag type.o

If we look at the behavior of bag and duplicate_bag though, we see that the behavior of both seems to be the same.

So what is the difference between the two???

If you dig into the documentation, and look at the description of the types under ets:new/2, it says that a bag will allow duplicate keys, but allow the item to only be added once, a duplicate_bag will allow multiple entries even if they have the same values as well.

To see this in action, we will add Item1 to both the ETS_Bag table and the ETS_DupBag table and see what happens.

First with just the ETs bag type.

ets:insert(ETS_Bag, Item1).
% true
ets:tab2list(ETS_Bag).
% [{1,a},{1,"one"},{1.0,"a"}]

The return value is the same as it was before, so adding an item that is already in a ETS table of type bag will not add it again.

So what does the duplicate_bag type of ETS table do?

ets:insert(ETS_DupBag, Item1).
% true
ets:tab2list(ETS_DupBag).
% [{1,a},{1,"one"},{1,a},{1.0,"a"}]

And we can see the tuple {1, a} shows up twice, because we called ets:insert/2 with that value twice.

–Proctor

Erlang Thursday – ETS Introduction, Part 2

Today’s Erlang Thursday continues the introduction to the ets module, and ETS in general.

We saw last time that ETS tables are destroyed when the parent process crashes, so the question comes, how might we be able to keep our ETS tables alive if we just “Let It Crash!”?

To solve this problem, we will take a look at the function ets:give_away/3 and the option of specifying the heir at table construction.

First, we will create a function that will represent a process we can give the table ownership to. This function just does a receive and never times out.

Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>

And now with that function, we can spawn a process to run that function.

Process = spawn(Fun).
% <0.53.0>

We create a new ETS Table,

Table = ets:new(table, []).
% 20498

and give it away to the process we just spawned.

ets:give_away(Table, Process, []).
% true

We can look at the table info, and see the owner is now the process we spawned as the Pid for the process aligns with the Pid in the owner tuple in the table settings.

ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.53.0>},
%  {heir,none},
%  {name,table},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Now that we have supposedly transferred ownership, time to crash our current process, which is the one that was the original owner before the transfer.

1 = 2.
% ** exception error: no match of right hand side value 2
self().
% <0.58.0>

We check if the process we spawned is still alive, mostly out of showing that there is nothing up our sleeves.

is_process_alive(Process).
% true

And let’s take a look at the “info” for the table again, and see if it is still available.

ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.53.0>},
%  {heir,none},
%  {name,table},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

It is still alive!!! We did transfer ownership, so if our process crashes the ETS table still stays alive.

Time to kill that process

exit(Process, "Because").
% true
is_process_alive(Process).
% false

and watch the ETS table disappear…

ets:info(Table).
% undefined

This time, let’s use the heir option when creating an ETS table, and take advantage of the magic of ownership transfer for an ETS table to a heir.

In this case, the shell will be the heir when the owning process dies.

TableWithHeir = ets:new(table, [{heir, self(), "something went wrong"}]).
% 24594

We create a new process, and assign ownership of the ETS table to the new process.

Process2 = spawn(Fun).
% <0.71.0>
ets:give_away(TableWithHeir, Process2, []).
% true

We then look at the info for the table, and we can see both the owner is the new process, and the heir is our current process.

self().
% <0.58.0>
ets:info(TableWithHeir).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,349},
%  {owner,<0.71.0>},
%  {heir,<0.58.0>},
%  {name,table},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Time to kill the owning process again…

exit(Process2, "Because").
% true
is_process_alive(Process2).
% false

And if we inspect the table info again, we can see the current process is now both the owner and the heir.

ets:info(TableWithHeir).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,349},
%  {owner,<0.58.0>},
%  {heir,<0.58.0>},
%  {name,table},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

We spawn up a new process, and we give the table to that new process.

Process3 = spawn(Fun).
% <0.78.0>
ets:give_away(TableWithHeir, Process3, []).
% true

The owner now becomes that new process, and our current process is still the heir.

ets:info(TableWithHeir).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,349},
%  {owner,<0.78.0>},
%  {heir,<0.58.0>},
%  {name,table},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

So by taking advantage of the ability to specify a heir, and using ets:give_away/3, we can help keep the ETS table alive.

One way this might be taken advantage of is that we have a supervisor create a “heir” process, and then create the child process that would own the ETS table, and if the child dies, it can then transfer ownership back to the heir process until the new “owning” process can be restarted, and then the heir process can then transfer ownership of the ETS table to the “newly restarted” process.

–Proctor