Erlang Thursday – More ETS data matching (and querying)

In today’s Erlang Thursday we continue from last week in looking at getting data from ETS.

To refresh, we have a module markov_words, and for this week we have added a new function markov_words:create_word_triples/1.

-module(markov_words).

-export([create_word_pairs/1,
         create_word_triples/1]).

-spec create_word_pairs(string()) -> list({string(), string()}).
create_word_pairs(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_pairs([], Words).

-spec create_word_triples(string()) -> list({string(), string(), string()}).
create_word_triples(Text) ->
  Words = string:tokens(Text, " \t\n"),
  create_word_triples(Words, []).


create_word_pairs(WordPairs, [_Word|[]]) ->
    WordPairs;
create_word_pairs(WordPairs, [Word|Words]) ->
    [Following|_] = Words,
    UpdatedWordPairs = [{Word, Following} | WordPairs],
    create_word_pairs(UpdatedWordPairs, Words).


create_word_triples([_Word, _SecondWord | []], WordTriples) ->
    WordTriples;
create_word_triples([FirstWord | Words], WordTriples) ->
    [SecondWord, Following | _] = Words,
    UpdatedWordTriples = [{FirstWord, SecondWord, Following} | WordTriples],
    create_word_triples(Words, UpdatedWordTriples).

The excuse for having this new function is that it would allow us to get more refined Markov chains by picking the probability of the next word by having the state be the compound key of the last two words seen.

With our module updated and defined, we get back to our Erlang shell to test things out, by compiling our module and loading up our intro text into a variable.

c(markov_words).
% {ok,markov_words}

ToTC = "It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of Light, it was the season of Darkness,
it was the spring of hope, it was the winter of despair,
we had everything before us, we had nothing before us,
we were all going direct to Heaven,
we were all going direct the other way--in short,
the period was so far like the present period,
that some of its noisiest authorities insisted on its
being received, for good or for evil, in the superlative
degree of comparison only.

There were a king with a large jaw and a queen with a
plain face, on the throne of England; there were a king
with a large jaw and a queen with a fair face,
on the throne of France. In both countries it was
clearer than crystal to the lords of the State preserves
of loaves and fishes, that things in general were
settled for ever.".

We create our fresh ETS table for this week, create a new process to own it, and give it away (in case we type something wrong and cause the current session of the shell to crash).

MarkovWords = ets:new(markov_word_tuples, [public, duplicate_bag]).
% 16402
Fun = fun() -> receive after infinity -> ok end end.
% #Fun<erl_eval.20.54118792>
SomeProcess = spawn(Fun).
% <0.58.0>
ets:give_away(MarkovWords, SomeProcess, []).
% true

This week, in addition to adding our word pair tuples to ETS, we will also add in our new word triple tuples to ETS in the same table.

[[ ets:insert(MarkovWords, WordPair) || WordPair <- markov_words:create_word_pairs(ToTC)]].

[[ ets:insert(MarkovWords, WordTriple) || WordTriple <- markov_words:create_word_triples(ToTC)]].

Since we have both word pairs and word triples in the same ETS table, we can see that with ets:match_object/2, we can specify a match_pattern() for only the two tuples

ets:match_object(MarkovWords, {"of", '$1'}).
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"}]

or a match_pattern() that will only match the three tuples.

ets:match_object(MarkovWords, {"of", '$1', '$2'}).
% [{"of","loaves","and"},
%  {"of","the","State"},
%  {"of","France.","In"},
%  {"of","England;","there"},
%  {"of","comparison","only."},
%  {"of","its","noisiest"},
%  {"of","despair,","we"},
%  {"of","hope,","it"},
%  {"of","Darkness,","it"},
%  {"of","Light,","it"},
%  {"of","incredulity,","it"},
%  {"of","belief,","it"},
%  {"of","foolishness,","it"},
%  {"of","wisdom,","it"},
%  {"of","times,","it"},
%  {"of","times,","it"}]

Where as if we use the ets:lookup/2 with the key, we get all items with the key, regardless of the tuple size.

ets:lookup(MarkovWords, "of").
% [{"of","loaves"},
%  {"of","the"},
%  {"of","France."},
%  {"of","England;"},
%  {"of","comparison"},
%  {"of","its"},
%  {"of","despair,"},
%  {"of","hope,"},
%  {"of","Darkness,"},
%  {"of","Light,"},
%  {"of","incredulity,"},
%  {"of","belief,"},
%  {"of","foolishness,"},
%  {"of","wisdom,"},
%  {"of","times,"},
%  {"of","times,"},
%  {"of","loaves","and"},
%  {"of","the","State"},
%  {"of","France.","In"},
%  {"of","England;","there"},
%  {"of","comparison","only."},
%  {"of","its","noisiest"},
%  {"of","despair,","we"},
%  {"of","hope,","it"},
%  {"of","Darkness,","it"},
%  {"of","Light,",[...]},
%  {"of",[...],...},
%  {[...],...},
%  {...}|...]

And unlike ets:lookup/2, with ets:match_object/2 we can match on any tuple element, and not just the key.

ets:match_object(MarkovWords, {'$1', "the", '$2'}).
% [{"on","the","throne"},
%  {"on","the","throne"},
%  {"direct","the","other"},
%  {"short,","the","period"},
%  {"like","the","present"},
%  {"of","the","State"},
%  {"to","the","lords"},
%  {"in","the","superlative"},
%  {"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the","season"},
%  {"was","the","season"},
%  {"was","the","epoch"},
%  {"was","the","epoch"},
%  {"was","the","age"},
%  {"was","the","age"},
%  {"was","the","worst"},
%  {"was","the","best"}]

And like ets:match_object/2, ets:match/2 can match based off the tuple itself as well.

ets:match(MarkovWords, {"was", "the", '$1'}).
% [["winter"],
%  ["spring"],
%  ["season"],
%  ["season"],
%  ["epoch"],
%  ["epoch"],
%  ["age"],
%  ["age"],
%  ["worst"],
%  ["best"]]

But sometimes we might want finer grain control over how the results are given back to us, such as a single list of items instead of a nested list of strings. Or maybe we even have some criteria that we want to hold true as part of our selections on the data.

Enter ets:select/2.

ets:select/2 takes the table as its first argument, and a match_spec() as its second argument.

The match_spec() is a list of three-tuples, where the first element is the match pattern, second element is a list of guard clause tuples, and the last element is the result is a term representation of the result for each match.

If we want to call ets:select/2 and have it align with ets:match/2 our call looks like the following.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], [['$1']]}]).
% [["winter"],
%  ["spring"],
%  ["season"],
%  ["season"],
%  ["epoch"],
%  ["epoch"],
%  ["age"],
%  ["age"],
%  ["worst"],
%  ["best"]]

The second argument is a list of match_spec()s, of which there is only one which consists of:
1). a match_pattern() of {"was", "the", '$1'}, which is the same thing we gave to ets:match/2
2). [], and empty list of guard condition tuples, and
3). [[‘$1’]] for the result term, which is the list of terms we want the result formatted as, in this case we want each result to be in its own list.

If we just wanted to get the word themselves as a list, we can update the result term part of the match_spec() to be ['$1'] instead.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], ['$1']}]).
% ["winter","spring","season","season","epoch","epoch","age",
%  "age","worst","best"]

If we wanted something that looked more like a ets:match_object/2 result set we can use the result term of '$_', which signifies the whole object.

ets:select(MarkovWords, [{{"was", "the", '$1'}, [], ['$_']}]).
% [{"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the","season"},
%  {"was","the","season"},
%  {"was","the","epoch"},
%  {"was","the","epoch"},
%  {"was","the","age"},
%  {"was","the","age"},
%  {"was","the","worst"},
%  {"was","the","best"}]

And if we wanted to only match on one of the items, and capture the other items in the tuple, we can use the result of '$$' which returns all of the match variable in a list, ordered by variable number as opposed to position in the match_pattern().

ets:select(MarkovWords, [{{"was", '$1', '$2'}, [], ['$$']}]).
% [["clearer","than"],
%  ["so","far"],
%  ["the","winter"],
%  ["the","spring"],
%  ["the","season"],
%  ["the","season"],
%  ["the","epoch"],
%  ["the","epoch"],
%  ["the","age"],
%  ["the","age"],
%  ["the","worst"],
%  ["the","best"]]

ets:select(MarkovWords, [{{"was", '$2', '$1'}, [], ['$$']}]).
% [["than","clearer"],
%  ["far","so"],
%  ["winter","the"],
%  ["spring","the"],
%  ["season","the"],
%  ["season","the"],
%  ["epoch","the"],
%  ["epoch","the"],
%  ["age","the"],
%  ["age","the"],
%  ["worst","the"],
%  ["best","the"]]

With ets:select/2 we also get the ability to specify multiple match_spec()s. This allows us to find all word triple word triples that have either "of" or "the" as the middle word.

ets:select(MarkovWords, [{{'$1', "the", '$2'}, [], ['$_']}, {{'$1', "of", '$2'}, [], ['$_']}]).
% [{"some","of","its"},
%  {"on","the","throne"},
%  {"on","the","throne"},
%  {"direct","the","other"},
%  {"preserves","of","loaves"},
%  {"throne","of","France."},
%  {"throne","of","England;"},
%  {"worst","of","times,"},
%  {"short,","the","period"},
%  {"winter","of","despair,"},
%  {"degree","of","comparison"},
%  {"epoch","of","incredulity,"},
%  {"epoch","of","belief,"},
%  {"spring","of","hope,"},
%  {"like","the","present"},
%  {"of","the","State"},
%  {"age","of","foolishness,"},
%  {"age","of","wisdom,"},
%  {"best","of","times,"},
%  {"season","of","Darkness,"},
%  {"season","of","Light,"},
%  {"to","the","lords"},
%  {"in","the","superlative"},
%  {"was","the","winter"},
%  {"was","the","spring"},
%  {"was","the",[...]},
%  {"was",[...],...},
%  {[...],...},
%  {...}|...]

And with guard clauses, we can find third item in the three-tuples that start with "was", that comes later in the dictionary than the word in the second position of the tuple.

ets:select(MarkovWords, [{{"was", '$1', '$2'}, [{'<', '$1', '$2'}], ['$2']}]).
% ["than","winter","worst"]

So with this week’s post we have seen other ways of using ets:match/2 and ets:match_object/2, and what they can get over using just a ets:lookup/2 for a key, as well as being able to take advantage of even more powerful querying by using ets:select/2.

Next week, we will look at more ways to use ets:select/2, and how we can use some other ets module functions to help create queries that can be easier to deconstruct at a quicker glance.

–Proctor

One thought on “Erlang Thursday – More ETS data matching (and querying)

  1. Pingback: Erlang Thursday – ETS, match_specs, and functionsProctor It | Proctor It

Leave a Reply

Your email address will not be published. Required fields are marked *