Erlang Thursday – ETS Introduction Part 5: keypos, compressed, read_conncurrency, and write_concurrency

Today’s Erlang Thursday continues the introduction to ETS and picks up with the promise from last week, and looks at the keypos ETS table setting, and the Tweaks that can be set.

First, we will take a look at the keypos setting.

The keypos is the 1-based index in the tuple to be stored that will be used as the key for the entry. If you remember from the part 3 of the introduction to ETS about the different table types, they use this index for their key comparison to determine if this is a unique item or not.

If we create a new table without specifying the keypos option, it defaults to 1.

Table = ets:new(some_name, []).
% 20498
ets:info(Table).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.50.0>},
%  {heir,none},
%  {name,some_name},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

To show the keypos in action, we will create a couple of items to insert into our ETS table so we can see the keypos in action.

Item1 = {1, a}.
% {1,a}
Item2 = {1.0, "a"}.
% {1.0,"a"}
Item3 = {1, "one"}.
% {1,"one"}
Item4 = {a, "a"}.
% {a,"a"}
Item5 = {"a", a}.
% {"a",a}

In the items above, we have some duplicate entries across both the first item and the second item in the two-tuples.

We will go ahead and insert each one of these items in turn, keeping in mind that this table is a set, so any new insert with the same key, will override the previous value for the same key.

ets:insert(Table, Item1).
% true
ets:tab2list(Table).
% [{1,a}]
ets:insert(Table, Item2).
% true
ets:tab2list(Table).
% [{1,a},{1.0,"a"}]
ets:insert(Table, Item3).
% true
ets:tab2list(Table).
% [{1,"one"},{1.0,"a"}]
ets:insert(Table, Item4).
% true
ets:tab2list(Table).
% [{1,"one"},{a,"a"},{1.0,"a"}]
ets:insert(Table, Item5).
% true
ets:tab2list(Table).
% [{"a",a},{1,"one"},{a,"a"},{1.0,"a"}]

When we added Item3 above, it replaced Item1 in the table, since they both have a 1 for the first element in their two-tuple.

We will now create a new table with a keypos of 2, and see how the exact same steps of inserting is changed with a different keypos value.

KeyPosTwo = ets:new(key_pos_2, [{keypos, 2}]).
% 24595
ets:insert(KeyPosTwo, Item1).
% true
ets:tab2list(KeyPosTwo).
% [{1,a}]
ets:insert(KeyPosTwo, Item2).
% true
ets:tab2list(KeyPosTwo).
% [{1.0,"a"},{1,a}]
ets:insert(KeyPosTwo, Item3).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{1.0,"a"},{1,a}]
ets:insert(KeyPosTwo, Item4).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{a,"a"},{1,a}]
ets:insert(KeyPosTwo, Item5).
% true
ets:tab2list(KeyPosTwo).
% [{1,"one"},{a,"a"},{"a",a}]

In this case, it wasn’t until we added Item4 that we had an override, as both Item2 and Item4 both have an "a" as their second item. Then we we add Item5 it overwrites the Item1, as they both have the atom a as their second element.

And if we set a keypos of some value, say three, and we try to insert a tuple that has fewer items, we will get an exception of type bad argument.

KeyPosThree = ets:new(key_pos_3, [{keypos, 3}]).
% 28692
ets:insert(KeyPosThree, Item1).
% ** exception error: bad argument
%      in function  ets:insert/2
%         called as ets:insert(28692,{1,a})

Now it is time to look at the compressed option when creating a table.

When creating a new table, the default setting is for it to be uncompressed, as we can see in the table info since it shows {compressed,false}.

UncompressedTable = ets:new(uc, []).
% 32786
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,uc},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

We create a new table, with the compressed option, and when we look at ets:info/1 for the table, we see that it show {compressed,true}.

CompressedTable = ets:new(uc, [compressed]).
% 45074
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,305},
%  {owner,<0.81.0>},
%  {heir,none},
%  {name,uc},
%  {size,0},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

compressed, according to the documentation at least, says that it stores the data in a “more compact format to consume less memory”. It also warns that this can this can make operations that need to check the entire tuple slower, and that the key is not stored compressed, at least in the current implementation.

So let’s see what kind of memory difference compressed makes.

To start with, we will insert 100_000 items into our ETS tables and see what the resulting memory size becomes. We will insert a new tuple of {X, X}, for all numbers from 1 to 100_000.

lists:foreach(fun(X) -> ets:insert(CompressedTable, {X, X}) end,
              lists:seq(1, 100000)).
% ok
lists:foreach(fun(X) -> ets:insert(UncompressedTable, {X, X}) end,
              lists:seq(1, 100000)).
% ok
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,714643},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,814643},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Interesting.

For the compressed table the memory is reported to be 814643, but the uncompressed shows the memory to be less than that with 714643.

Maybe it doesn’t like to compact integer values very much, so let’s do the same thing, but use a string for the second item in the tuple.

lists:foreach(fun(X) -> ets:insert(UncompressedTable, {X, integer_to_list(X)}) end,
              lists:seq(1, 100000)).
% ok
lists:foreach(fun(X) -> ets:insert(CompressedTable, {X, integer_to_list(X)}) end, 
              lists:seq(1, 100000)).
% ok
ets:info(CompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,true},
%  {memory,914644},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]
ets:info(UncompressedTable).
% [{read_concurrency,false},
%  {write_concurrency,false},
%  {compressed,false},
%  {memory,1692433},
%  {owner,<0.109.0>},
%  {heir,none},
%  {name,uc},
%  {size,100000},
%  {node,nonode@nohost},
%  {named_table,false},
%  {type,set},
%  {keypos,1},
%  {protection,protected}]

Now using strings in our tuples instead of just using integers, we can see that the compressed ETS table memory is 914644, where as the uncompressed ETS table’s memory is 1692433.

So in addition to thinking about the way you are going to be matching on the data when trying to determine if the table should be compressed, it looks like you also need to think about the type of data you are going to be putting into the ETS table.

The last two options to be discussed are read_concurrency and write_concurrency.

read_conccurency is by default set to false, and, according to the documentation is best for when “read operations are much more frequent than write operations, or when concurrent reads and writes comes in large read and write bursts”.

So if you have a table that has a bunch of reads with the writes infrequently interspersed between the reads, this would be when you would want to enable read_concurrency, as the documentation states that switching between reads and writes is more expensive.

The write_concurrency option is set to false by default, causing any additional concurrent writes to block while an write operation is proceeding. When set to true different tuples of the same table can be written to by concurrent processes, and does not affect any table of the type ordered_set.

This should be it as far as the introduction goes. Next week we will start looking at the different operations we can perform using ETS and ETS tables.

–Proctor