Today’s Erlang Thursday continues the introduction to ETS and picks up with the promise from last week, and looks at the keypos
ETS table setting, and the Tweaks that can be set.
First, we will take a look at the keypos
setting.
The keypos
is the 1-based index in the tuple to be stored that will be used as the key for the entry. If you remember from the part 3 of the introduction to ETS about the different table types, they use this index for their key comparison to determine if this is a unique item or not.
If we create a new table without specifying the keypos
option, it defaults to 1.
Table = ets:new(some_name, []). % 20498 ets:info(Table). % [{read_concurrency,false}, % {write_concurrency,false}, % {compressed,false}, % {memory,305}, % {owner,<0.50.0>}, % {heir,none}, % {name,some_name}, % {size,0}, % {node,nonode@nohost}, % {named_table,false}, % {type,set}, % {keypos,1}, % {protection,protected}]
To show the keypos
in action, we will create a couple of items to insert into our ETS table so we can see the keypos
in action.
Item1 = {1, a}. % {1,a} Item2 = {1.0, "a"}. % {1.0,"a"} Item3 = {1, "one"}. % {1,"one"} Item4 = {a, "a"}. % {a,"a"} Item5 = {"a", a}. % {"a",a}
In the items above, we have some duplicate entries across both the first item and the second item in the two-tuples.
We will go ahead and insert each one of these items in turn, keeping in mind that this table is a set, so any new insert with the same key, will override the previous value for the same key.
ets:insert(Table, Item1). % true ets:tab2list(Table). % [{1,a}] ets:insert(Table, Item2). % true ets:tab2list(Table). % [{1,a},{1.0,"a"}] ets:insert(Table, Item3). % true ets:tab2list(Table). % [{1,"one"},{1.0,"a"}] ets:insert(Table, Item4). % true ets:tab2list(Table). % [{1,"one"},{a,"a"},{1.0,"a"}] ets:insert(Table, Item5). % true ets:tab2list(Table). % [{"a",a},{1,"one"},{a,"a"},{1.0,"a"}]
When we added Item3
above, it replaced Item1
in the table, since they both have a 1
for the first element in their two-tuple.
We will now create a new table with a keypos
of 2
, and see how the exact same steps of inserting is changed with a different keypos
value.
KeyPosTwo = ets:new(key_pos_2, [{keypos, 2}]). % 24595 ets:insert(KeyPosTwo, Item1). % true ets:tab2list(KeyPosTwo). % [{1,a}] ets:insert(KeyPosTwo, Item2). % true ets:tab2list(KeyPosTwo). % [{1.0,"a"},{1,a}] ets:insert(KeyPosTwo, Item3). % true ets:tab2list(KeyPosTwo). % [{1,"one"},{1.0,"a"},{1,a}] ets:insert(KeyPosTwo, Item4). % true ets:tab2list(KeyPosTwo). % [{1,"one"},{a,"a"},{1,a}] ets:insert(KeyPosTwo, Item5). % true ets:tab2list(KeyPosTwo). % [{1,"one"},{a,"a"},{"a",a}]
In this case, it wasn’t until we added Item4
that we had an override, as both Item2
and Item4
both have an "a"
as their second item. Then we we add Item5
it overwrites the Item1
, as they both have the atom a
as their second element.
And if we set a keypos
of some value, say three, and we try to insert a tuple that has fewer items, we will get an exception of type bad argument
.
KeyPosThree = ets:new(key_pos_3, [{keypos, 3}]). % 28692 ets:insert(KeyPosThree, Item1). % ** exception error: bad argument % in function ets:insert/2 % called as ets:insert(28692,{1,a})
Now it is time to look at the compressed
option when creating a table.
When creating a new table, the default setting is for it to be uncompressed, as we can see in the table info since it shows {compressed,false}
.
UncompressedTable = ets:new(uc, []). % 32786 ets:info(UncompressedTable). % [{read_concurrency,false}, % {write_concurrency,false}, % {compressed,false}, % {memory,305}, % {owner,<0.81.0>}, % {heir,none}, % {name,uc}, % {size,0}, % {node,nonode@nohost}, % {named_table,false}, % {type,set}, % {keypos,1}, % {protection,protected}]
We create a new table, with the compressed
option, and when we look at ets:info/1
for the table, we see that it show {compressed,true}
.
CompressedTable = ets:new(uc, [compressed]). % 45074 ets:info(CompressedTable). % [{read_concurrency,false}, % {write_concurrency,false}, % {compressed,true}, % {memory,305}, % {owner,<0.81.0>}, % {heir,none}, % {name,uc}, % {size,0}, % {node,nonode@nohost}, % {named_table,false}, % {type,set}, % {keypos,1}, % {protection,protected}]
compressed
, according to the documentation at least, says that it stores the data in a “more compact format to consume less memory”. It also warns that this can this can make operations that need to check the entire tuple slower, and that the key is not stored compressed, at least in the current implementation.
So let’s see what kind of memory difference compressed
makes.
To start with, we will insert 100_000 items into our ETS tables and see what the resulting memory size becomes. We will insert a new tuple of {X, X}
, for all numbers from 1 to 100_000.
lists:foreach(fun(X) -> ets:insert(CompressedTable, {X, X}) end, lists:seq(1, 100000)). % ok lists:foreach(fun(X) -> ets:insert(UncompressedTable, {X, X}) end, lists:seq(1, 100000)). % ok ets:info(UncompressedTable). % [{read_concurrency,false}, % {write_concurrency,false}, % {compressed,false}, % {memory,714643}, % {owner,<0.109.0>}, % {heir,none}, % {name,uc}, % {size,100000}, % {node,nonode@nohost}, % {named_table,false}, % {type,set}, % {keypos,1}, % {protection,protected}] ets:info(CompressedTable). % [{read_concurrency,false}, % {write_concurrency,false}, % {compressed,true}, % {memory,814643}, % {owner,<0.109.0>}, % {heir,none}, % {name,uc}, % {size,100000}, % {node,nonode@nohost}, % {named_table,false}, % {type,set}, % {keypos,1}, % {protection,protected}]
Interesting.
For the compressed table the memory is reported to be 814643
, but the uncompressed shows the memory to be less than that with 714643
.
Maybe it doesn’t like to compact integer values very much, so let’s do the same thing, but use a string for the second item in the tuple.
lists:foreach(fun(X) -> ets:insert(UncompressedTable, {X, integer_to_list(X)}) end, lists:seq(1, 100000)). % ok lists:foreach(fun(X) -> ets:insert(CompressedTable, {X, integer_to_list(X)}) end, lists:seq(1, 100000)). % ok ets:info(CompressedTable). % [{read_concurrency,false}, % {write_concurrency,false}, % {compressed,true}, % {memory,914644}, % {owner,<0.109.0>}, % {heir,none}, % {name,uc}, % {size,100000}, % {node,nonode@nohost}, % {named_table,false}, % {type,set}, % {keypos,1}, % {protection,protected}] ets:info(UncompressedTable). % [{read_concurrency,false}, % {write_concurrency,false}, % {compressed,false}, % {memory,1692433}, % {owner,<0.109.0>}, % {heir,none}, % {name,uc}, % {size,100000}, % {node,nonode@nohost}, % {named_table,false}, % {type,set}, % {keypos,1}, % {protection,protected}]
Now using strings in our tuples instead of just using integers, we can see that the compressed ETS table memory is 914644
, where as the uncompressed ETS table’s memory is 1692433
.
So in addition to thinking about the way you are going to be matching on the data when trying to determine if the table should be compressed, it looks like you also need to think about the type of data you are going to be putting into the ETS table.
The last two options to be discussed are read_concurrency
and write_concurrency
.
read_conccurency
is by default set to false
, and, according to the documentation is best for when “read operations are much more frequent than write operations, or when concurrent reads and writes comes in large read and write bursts”.
So if you have a table that has a bunch of reads with the writes infrequently interspersed between the reads, this would be when you would want to enable read_concurrency
, as the documentation states that switching between reads and writes is more expensive.
The write_concurrency
option is set to false
by default, causing any additional concurrent writes to block while an write operation is proceeding. When set to true
different tuples of the same table can be written to by concurrent processes, and does not affect any table of the type ordered_set
.
This should be it as far as the introduction goes. Next week we will start looking at the different operations we can perform using ETS and ETS tables.
–Proctor