Category Archives: Erlang Thursday

Erlang Thursday – digraph:del_path/3

Today’s Erlang Thursday is on digraph:del_path/3.

We will continue working with the same graph we started with in the previous post on digraph:get_path/3.

Graph = digraph:new().
% {digraph,20498,24595,28692,true}
V1 = digraph:add_vertex(Graph).
% ['$v'|0]
V2 = digraph:add_vertex(Graph).
% ['$v'|1]
V3 = digraph:add_vertex(Graph).
% ['$v'|2]
V4 = digraph:add_vertex(Graph).
% ['$v'|3]
E1 = digraph:add_edge(Graph, V1, V2).
% ['$e'|0]
E2 = digraph:add_edge(Graph, V2, V3).
% ['$e'|1]
E3 = digraph:add_edge(Graph, V3, V4).
% ['$e'|2]
E4 = digraph:add_edge(Graph, V2, V4).
% ['$e'|3]
E5 = digraph:add_edge(Graph, V4, V1).
% ['$e'|4]

digraph:del_path/3 takes three arguments, a Graph, a source vertex, and a destination vertex, and removes all edges in a each path in the graph from the source vertex to the destination vertex, until no path exist between the source and destination vertices.

The return value of digraph:del_path/3 is always a return value of true.

Looking at the picture of the graph above as reference, we are going to call digraph:del_path/3 for the graph with a source vertex of V1, and a destination vertex of V4.

digraph:del_path(Graph, V1, V4).
% true
digraph:vertices(Graph).
% [['$v'|2],['$v'|1],['$v'|0],['$v'|3]]
digraph:edges(Graph).
% [['$e'|1],['$e'|2],['$e'|4]]

Translating the edge names, we see that the edge from V1 to V2 has been removed, as well as the edge from V2 to V4 has been removed.

So how did Erlang come up with this result?

This puzzled me at first, as it wasn’t one of the two scenarios I was expecting to see, which were either: remove all edges but the edge from V4 to V1, or remove only the edge from V1 to V2.

I then opened the Erlang source code on Github for digraph module to clarify my thinking, and looking at the code it then made sense what was happening.

First digraph:del_path/3 calls digraph:get_path/3, and removes all edges in that path, and then recurses until no path is found.

This is when it clicked as to why Erlang was removing only those edges.

If we call digraph:get_path/3 on a fresh version of the graph, we see that it returns the path of V1 -> V2 -> V4.

digraph:get_path(Graph, V1, V4).
[['$v'|0],['$v'|1],['$v'|3]]

Erlang then removes the edges in that path, and the will call digraph:del_path/3, which calls digraph:get_path/3 again, but as we removed the edge between V1 and V2, no path is found so the process is finished.

This is why we see more edges removed if we reset the Graph again (by exiting the shell and recreating it from scratch by pasting the initialization into the shell), and call digraph:del_path/3 between V2 and V4.

digraph:del_path(Graph, V2, V4).
% true
digraph:edges(Graph).
% [['$e'|0],['$e'|4]]

This case there are the paths V2 -> V4 and V2 -> V3 -> V4, and if we remove the path V2 -> V4, the removal of all the edges associated with that path doesn’t break the path of V2 -> V3 -> V4, so it can remove all edges in that path as well.

So we have a win in the case where the documentation wasn’t quite as clear as it could be, but having the Erlang standard library be open source gets us a win because we can always go in and check out what the code is really doing.

–Proctor

Erlang Thursday – digraph:get_cycle/2

Today’s Erlang Thursday is on digraph:get_cycle/2.

We will continue working with the graph from the previous post on digraph:get_path/3.

Graph = digraph:new().
% {digraph,20498,24595,28692,true}
V1 = digraph:add_vertex(Graph).
% ['$v'|0]
V2 = digraph:add_vertex(Graph).
% ['$v'|1]
V3 = digraph:add_vertex(Graph).
% ['$v'|2]
V4 = digraph:add_vertex(Graph).
% ['$v'|3]
E1 = digraph:add_edge(Graph, V1, V2).
% ['$e'|0]
E2 = digraph:add_edge(Graph, V2, V3).
% ['$e'|1]
E3 = digraph:add_edge(Graph, V3, V4).
% ['$e'|2]
E4 = digraph:add_edge(Graph, V2, V4).
% ['$e'|3]
E5 = digraph:add_edge(Graph, V4, V1).
% ['$e'|4]

digraph:get_cycle/2 takes a graph G, and an vertex V, and tries to find a path that creates a cycle between the vertex V in graph G.

digraph:get_cycle(Graph, V1).
% [['$v'|0],['$v'|1],['$v'|3],['$v'|0]]
digraph:get_cycle(Graph, V2).
% [['$v'|1],['$v'|3],['$v'|0],['$v'|1]]

Next, we add a new vertex V5, and a new edge originating from V4 and ending on V5

We then call digraph:get_cycle/2 on V5, and we get back a false as no cyle exists in the graph with vertex V5 in it.

V5 = digraph:add_vertex(Graph).
% ['$v'|4]
E6 = digraph:add_edge(Graph, V4, V5).
% ['$e'|5]
digraph:get_cycle(Graph, V5).
% false

The digraph module also contains the function digraph:get_short_cycle/2.

digraph:get_short_cycle/2 attempts to find the shortest cycle in the graph G for vertex V.

The documentation for digraph:get_short_cycle/2 exact phrasing is:

Tries to find an as short as possible simple cycle through the vertex V of the digraph G.

So depending on how you read that, the shortest cycle might not be guaranteed to be returned, but simply a shorter cycle, which may depend on the overall size and complexity of the graph.

digraph:get_short_cycle(Graph, V1).
% [['$v'|0],['$v'|1],['$v'|3],['$v'|0]]
digraph:get_short_cycle(Graph, V5).
% false

–Proctor

Erlang Thursday – digraph:in_neighbors/2

Today’s Erlang Thursday is on digraph:in_neighbors/2.

digraph:in_neighbors/2 takes a graph G, and a vertex V, and will return a list of all the vertices that have edges originating from them that are directed toward the vertex V.

We will continue working with the graph from last week’s post on digraph:get_path/3.

Graph = digraph:new().
% {digraph,20498,24595,28692,true}
V1 = digraph:add_vertex(Graph).
% ['$v'|0]
V2 = digraph:add_vertex(Graph).
% ['$v'|1]
V3 = digraph:add_vertex(Graph).
% ['$v'|2]
V4 = digraph:add_vertex(Graph).
% ['$v'|3]
E1 = digraph:add_edge(Graph, V1, V2).
% ['$e'|0]
E2 = digraph:add_edge(Graph, V2, V3).
% ['$e'|1]
E3 = digraph:add_edge(Graph, V3, V4).
% ['$e'|2]
E4 = digraph:add_edge(Graph, V2, V4).
% ['$e'|3]
E5 = digraph:add_edge(Graph, V4, V1).
% ['$e'|4]

With that graph setup again, we can now find the in_neighbors of different vertices in our graph.

digraph:in_neighbours(Graph, V4).
% [['$v'|1],['$v'|2]]
digraph:in_neighbours(Graph, V1).
% [['$v'|3]]
digraph:in_neighbours(Graph, V2).
% [['$v'|0]]

So for vertex V4 we see the return value of [['$v'|1],['$v'|2]], which are the vertices V2 and V3. For V1 we have an inbound neighbor of V4, and for V2 we have the inbound neighbor of V1.

digraph:out_neighbors/2

The digraph module also contains the function digraph:out_neighbors/2, which returns a list of the vertices that a the given vertex “points to” with its edges in the directed graph.

digraph:out_neighbours(Graph, V2).
% [['$v'|3],['$v'|2]]
digraph:out_neighbours(Graph, V4).
% [['$v'|0]]
digraph:out_neighbours(Graph, V1).
% [['$v'|1]]

We can see from the picture of our graph that V2 has edges that “point to” the vertices V3 and V4, and if we look at the result of digraph:out_neighbors/2, we get the result of the vertices V3 and V4.

In this case we get the list of vertices where V4 is first and V3 is second, but that may not be the case, as the documentation states that the the edges are “in some unspecified order”, which holds true of digraph:in_neighbors/2 as well.

–Proctor

Erlang Thursday – digraph:get_path/3

Today’s Erlang Thursday is on digraph:get_path/3.

digraph:get_path/3 takes a graph, a starting vertex, and an ending vertex and will attempt to find some path through the graph of length greater than zero, where all vertices in the path are distinct, except allowing for the first and last vertices to be the same.

If a path is found, it returns a list of the vertices visited (in order) to complete the path, if no path is found, false is returned.

First we will setup a new graph that we can traverse.

Graph = digraph:new().
% {digraph,20498,24595,28692,true}
V1 = digraph:add_vertex(Graph).
% ['$v'|0]
V2 = digraph:add_vertex(Graph).
% ['$v'|1]
V3 = digraph:add_vertex(Graph).
% ['$v'|2]
V4 = digraph:add_vertex(Graph).
% ['$v'|3]
E1 = digraph:add_edge(Graph, V1, V2).
% ['$e'|0]
E2 = digraph:add_edge(Graph, V2, V3).
% ['$e'|1]
E3 = digraph:add_edge(Graph, V3, V4).
% ['$e'|2]
E4 = digraph:add_edge(Graph, V2, V4).
% ['$e'|3]
E5 = digraph:add_edge(Graph, V4, V1).
% ['$e'|4]

This will give us a graph that looks like the following:

Now we can get to playing with digraph:get_path/3 and see what the paths are from any sets of nodes.

digraph:get_path(Graph, V2, V3).
% [['$v'|1],['$v'|2]]
digraph:get_path(Graph, V2, V4).
% [['$v'|1],['$v'|3]]
digraph:get_path(Graph, V2, V1).
% [['$v'|1],['$v'|3],['$v'|0]]
digraph:get_path(Graph, V3, V1).
% [['$v'|2],['$v'|3],['$v'|0]]
digraph:get_path(Graph, V1, V4).
% [['$v'|0],['$v'|1],['$v'|3]]
digraph:get_path(Graph, V1, V1).
% [['$v'|0],['$v'|1],['$v'|3],['$v'|0]]

Note that these just happen to be the shortest paths, but this is not guaranteed to return the shortest path, but just the first path found.

And if we add a new vertex, and don’t connect it to any other node in the graph, and we call digraph:get_path/3, we can see it returns false.

V5 = digraph:add_vertex(Graph).
% ['$v'|4]
digraph:get_path(Graph, V1, V5).
% false

–Proctor

Erlang Thursday – digraph:add_edge/4

Today’s Erlang Thursday is on digraph:add_edge/4.

digraph:add_edge/4 takes a graph as its first argument, the originating (eminating) vertex as its second arugment, the destination (incident) vertex as its third argument, and a label.

Graph = digraph:new().
% {digraph,20498,24595,28692,true}
Vertex1 = digraph:add_vertex(Graph, foo).
% foo
Vertex2 = digraph:add_vertex(Graph, bar).
% bar
Edge1 = digraph:add_edge(Graph, Vertex1, Vertex2, {foo, bar}).
% ['$e'|0]
digraph:edges(Graph).
% [['$e'|0]]
Edge2 = digraph:add_edge(Graph, Vertex2, Vertex1, {bar, foo}).
% ['$e'|1]
digraph:edges(Graph).
% [['$e'|1],['$e'|0]]

The digraph module also contains digraph:add_edge/5 which allows you to specify the edge identifier, in this case we want the edge to be myEdge.

digraph:add_edge(Graph, myEdge, Vertex2, Vertex1, myLabel).
% myEdge
digraph:edges(Graph).
% [['$e'|1],['$e'|2],['$e'|3],myEdge,['$e'|0]]

As well as digraph:add_edge/3 which allows you to not specify the edge or the label.

digraph:add_edge(Graph, Vertex2, Vertex1).
% ['$e'|2]
digraph:add_edge(Graph, Vertex2, Vertex1).
% ['$e'|3]
digraph:edges(Graph).
% [['$e'|1],['$e'|2],['$e'|3],['$e'|0]]

And if you note in the examples for digraph:add_edge/3 and digraph:add_edge/5 we added a number of edges with the same eminate and incident vertices, and it was happy to create those edges for us.

We can also create acyclic digraphs by using digraph:new/1, and specifying that we want the digraph() to be acyclic.

Graph2 = digraph:new([acyclic]).
% {digraph,20498,24595,28692,false}
VertexA = digraph:add_vertex(Graph2, foo).
% foo
VertexB = digraph:add_vertex(Graph2, bar).
% bar
EdgeAB = digraph:add_edge(Graph2, VertexA, VertexB, {foo, bar}).
% ['$e'|0]
EdgeBA = digraph:add_edge(Graph2, VertexB, VertexA, {bar, foo}).
% {error,{bad_edge,[foo,bar]}}

When we try to add an edge that will create a cycle in an acyclic directed graph, we get a return of a bad_edge error with the two edges specified.

–Proctor

Erlang Thursday – digraph:add_vertex/1

Today’s Erlang Thursday starts to dig into the digraph module, as promised last week, and takes a look at digraph:add_vertex/1.

First we create a new directed graph, so we have something we can add vertices to.

Graph = digraph:new().
% {digraph,20498,24595,28692,true}

We then add some vertices to the graph by using digraph:add_vertex/1.

digraph:add_vertex(Graph).
% ['$v'|0]
digraph:add_vertex(Graph).
% ['$v'|1]
digraph:add_vertex(Graph).
% ['$v'|2]

As we don’t specify any information about the vertex we want to add, Erlang will create a new vertex for us of the format ['$v', I], with an empty list as the label where I is a non-negative integer.

We can also use digraph:add_vertex/2 to add a vertex if we wish to provide the vertex identifer, or provide vertex identifier and label in the case of digraph:add_vertex/3. As with digraph:add_vertex/1, digraph:add_vertex/2 uses the empty list as the label as well.

digraph:add_vertex(Graph, vertex1).
% vertex1
digraph:add_vertex(Graph, vertex2, "Vertex 2").
% vertex2

We have now added 5 vertices, and can check what vertices we have in the digraph() by using digraph:vertices/1.

digraph:vertices(Graph).
% [['$v'|2],['$v'|1],['$v'|0],vertex2,vertex1]

If we decide we want to try to add a vertex ourselves of the format ['$v' | I], we can run into trouble if you call digraph:add_vertex/1 after it.

digraph:add_vertex(Graph, ['$v' | 3]).
% ['$v'|3]
digraph:add_vertex(Graph).
% ['$v'|3]
digraph:vertices(Graph).
% [['$v'|2],['$v'|1],['$v'|0],['$v'|3],vertex2,vertex1]
digraph:add_vertex(Graph, ['$v' | 4]).
% ['$v'|4]
digraph:vertices(Graph).
% [['$v'|4],
%  ['$v'|2],
%  ['$v'|1],
%  ['$v'|0],
%  ['$v'|3],
%  vertex2,vertex1]

So we add a vertex by specifying the vertex() we want to add, and then add a new vertex and let Erlang take care of creating that vertex, and we wind up “losing” a vertex, as one essentially gets overridden when we look at the end state of the digraph().

–Proctor

Erlang Thursday – The digraph module

Today’s Erlang Thursday kicks of taking a look at the digraph module.

As I was looking into it, this module didn’t line up with my expectations of Erlang behavior, so I want to focus on that difference before taking a look at the functions in the module.

If we start by browsing through the function signatures in the digraph module, we can see that the only function that returns a digraph() are digraph:new/0 and digraph:new/1.

Thinking this was odd for a Erlang API, I went to the Erlang shell, and added a vertex to the digraph(), and then inpsected the result of that operation.

G = digraph:new().
% {digraph,69651,73748,77845,true}
G2 = digraph:add_vertex(G, foo, bar).
% foo

The return value of calling digraph:add_vertex/3 was foo, which was the second argument, and doesn’t match up with what the representation of a graph looks like.

Okay, time to look at the digraph() in G again then to see if that changed.

G.
% {digraph,69651,73748,77845,true}

That tuple result of the digraph() looks the same, so let’s see if that vertex we added is in the graph, since we did get the return value of foo.

digraph:vertices(G).
% [foo]
digraph:vertex(G, foo).
% {foo,bar}

Hrmm… Okay, looks like that vertex is in there.

Let’s add another vertex to the digraph() bound to G.

V = digraph:add_vertex(G).
% ['$v'|0]
digraph:vertices(G).
% [['$v'|0],foo]

That one is added as well.

HC SVNT DRACONES (Here Are Dragons)

So the behavior I want to call out in this post before we start looking at the functions in this module is that these functions exhibit observably mutable behavior on a digraph().

I say it is observably mutable, becuase while if it is not being changed under the covers of the implementation, the structure can be changed while the binding of the variable to the reference stays the same.

digraph:vertices(G).
% [['$v'|0],foo]
Copy = G.
% {digraph,69651,73748,77845,true}
V2 = digraph:add_vertex(G, wat).
% wat
digraph:vertices(Copy).
% [['$v'|0],foo,wat]
digraph:vertices(G).
% [['$v'|0],foo,wat]

This even mutates other varaible references as well, so this breaks any convention that I have seen in the Erlang ecosystem about keeping all data immutable.

We will continue looking at the digraph module in future Erlang Thursday posts, but I wanted to spend some time calling out the mutability inherent in the digraph()s, so that when you need to use a one, you can be aware that this is not something you want to use in your concurrent parts of your application without great caution.

Updated (October 18th)

As part of the translation into Lisp Flavoured Erlang as part of LFE Fridays, Robert Virding updated me with the reasoning of the digraph()‘s mutability, which he included in the translation as well.

The Dragons slain

The reason behind the dragons is how a digraph() is implemented. A digraph is built of 3 ETS tables, with, in this case, the table ids 8207, 12304 and 16401. You can see this by calling ets:i/0 which lists information about all the current tables. You can see that the 3 tables are owned by the LFE shell process:

> self().
<0.28.0>
> ets:i().
 id              name              type  size   mem      owner
 ----------------------------------------------------------------------------
 1               code              set   282    10393    code_server
 4098            code_names        set   64     7713     code_server
 8207            vertices          set   3      328      <0.28.0>
 12304           edges             set   0      305      <0.28.0>
 16401           neighbours        bag   2      319      <0.28.0>
 ac_tab          ac_tab            set   6      839      application_controller
 file_io_servers file_io_servers   set   0      305      file_server_2
 global_locks    global_locks      set   0      305      global_name_server
 global_names    global_names      set   0      305      global_name_server
...

ok

The digraph() structure itself is just a tagged tuple containing the table ids. As all changes are made to the ETS tables the structure itself never changes. Data about the tables and their contents can be read with ets:info/1 and ets:i/1.

–Proctor

Erlang Thursday – erl_tar:table/1

Today’s Erlang Thursday is on erl_tar:table/1.

erl_tar:table/1 returns a list of filenames included in the tar file.

erl_tar:table("animal_sounds.tar").
% {ok,["dog.txt","cat.txt","pony.txt","bear.txt"]}

There is also a version erl_tar:table/2 that takes a options list as well.

erl_tar:table("animal_sounds.tar.gz", [compressed]).
% {ok,["dog.txt","cat.txt","pony.txt","bear.txt"]}
erl_tar:table("animal_sounds.tar.gz", [compressed,verbose]).
% {ok,[{"dog.txt",regular,5,
%       {{2015,9,23},{22,18,47}},
%       420,501,20},
%      {"cat.txt",regular,5,{{2015,9,23},{22,18,56}},420,501,20},
%      {"pony.txt",regular,8,{{2015,9,23},{22,19,10}},420,501,20},
%      {"bear.txt",regular,19,
%       {{2015,9,23},{22,21,16}},
%       420,501,20}]}

With the verbose option, instead of just getting a list of filenames, we get a list of tuples when we pass verbose.

The tuple is: Filename, Filetype (regular file/directory or a symbolic link), Bytes of the file, Timestamp tuple, Permissions (expressed in decimal instead of octal), UserId, and GroupId.

The documentation does not specify any of the information about the return type, and the credit for clarification of what the 420,501,20 items represent is all from Robert Virding, from emailing him this post to be translated as part of LFE Fridays.

–Proctor

Erlang Thursday – erl_tar:extract/1

Today’s Erlang Thursday cover’s erl_tar:extract/1.

erl_tar:extract/1 takes a file, either as a binary tuple, file descriptor tuple, or filename, and extracts the contents of the tar out to the current directory.

Since we will need to have a tar file to extract, let’s create some files and add them to a new tar file.

$ echo "woof" > dog.txt
$ echo "meow" > cat.txt
$ echo "sparkle" > pony.txt
$ echo 'Wocka Wocka Wocka!' > bear.txt
$ tar -cvf animal_sounds.tar dog.txt cat.txt pony.txt bear.txt
a dog.txt
a cat.txt
a pony.txt
a bear.txt

And while we are at it, lets create a compressed version as well.

$ tar -cvzf animal_sounds.tar.gz dog.txt cat.txt pony.txt bear.txt
a dog.txt
a cat.txt
a pony.txt
a bear.txt

Since we are going to test out extracting the tar, we will go ahead and clean up the files that we put in the tar.

$ rm dog.txt cat.txt pony.txt bear.txt

With all the ceremony of making sure we have a tar file to experiment with out of the way, it is time to fire up our Erlang shell, and call erl_tar:extract/1.

erl_tar:extract("animal_sounds.tar").
% ok

That seemed straight forward enough, so let’s see if we have our files extracted back out at the command prompt.

$ ls dog.txt cat.txt pony.txt bear.txt
bear.txt cat.txt  dog.txt  pony.txt
$ rm dog.txt cat.txt pony.txt bear.txt

And since we saw them, we will go ahead and remove them to get back to a clean state.

erl_tar:extract/2

Erlang also has a erl_tar:extract/2, which allows us to give options to the extraction process, by passing a list as its second argument.

We can have erl_tar:extract/2 extract the files and tell it to be verbose, and then follow that up with another extraction, where we specify that we not only want it to be verbose, but don’t overwrite any files that are already there.

erl_tar:extract("animal_sounds.tar", [verbose]).
% x /Users/proctor/tmp/dog.txt
%
% x /Users/proctor/tmp/cat.txt
%
% x /Users/proctor/tmp/pony.txt
%
% x /Users/proctor/tmp/bear.txt
%
% ok
erl_tar:extract("animal_sounds.tar", [verbose, keep_old_files]).
% x /Users/proctor/tmp/dog.txt - exists, not created
%
% x /Users/proctor/tmp/cat.txt - exists, not created
%
% x /Users/proctor/tmp/pony.txt - exists, not created
%
% x /Users/proctor/tmp/bear.txt - exists, not created
%
% ok

And yet again, we swing back to the command prompt to remove the extracted files.

$ rm dog.txt cat.txt pony.txt bear.txt

Next we extract animal_sounds.tar.gz by passing the atom compressed in the list of options.

erl_tar:extract("animal_sounds.tar.gz", [verbose, compressed, keep_old_files]).
% x /Users/proctor/tmp/dog.txt
%
% x /Users/proctor/tmp/cat.txt
%
% x /Users/proctor/tmp/pony.txt
%
% x /Users/proctor/tmp/bear.txt
%
% ok

And sometimes when working with a tar file in your program, you don’t want to have to do all the management of the files on the filesystem just to read the contents of a tar file, so there is even an option to keep it all in memory.

erl_tar:extract("animal_sounds.tar.gz", [verbose, compressed, keep_old_files, memory]).
% {ok,[{"dog.txt",<<"woofn">>},
%      {"cat.txt",<<"meown">>},
%      {"pony.txt",<<"sparklen">>},
%      {"bear.txt",<<"Wocka Wocka Wocka!n">>}]}

When passing the memory option, the return value of erl_tar:extract/2 becomes an tuple of the status, and a list of tuples composed of the filename, and the contents of the file as a Binary for each file in the tar that was extracted.

If an error occurs on extraction to memory, for example we forget to pass the compressed option to a compressed tar file, it returns an error tuple.

erl_tar:extract("animal_sounds.tar.gz", [verbose, memory]).
% {error,eof}

There are quite a bit more options that erl_tar:extract/2 can take as well, so I highly recommend checking out the documentation for the full list of options.

–Proctor

Erlang Thursday – erl_tar:create/2

Today’s Erlang Thursday is on erl_tar:create/2.

erl_tar:create/2 creates a tar file with a given filename and adds the given list of filenames to the archive.

erl_tar:create/2 takes two arguments, the first is a filename to write to, and the second argument is a list of filenames to add to the tar file.

First, we will open up a new OS shell session and create some files to add to a new tar file.

$ echo "foo" > foo.txt
$ echo "bar" > bar.txt
$ echo "baz" > baz.txt
$ ls
bar.txt  baz.txt  foo.txt  test.tar

Now that we have some files to archive, we can open up a new erl session, and create a new tar file named test.tar.

erl_tar:create("test.tar", ["foo.txt", "bar.txt", "baz.txt"]).
% ok

That looks like it worked; so let’s go to a OS shell, and inspect the resulting file for the filename we gave to erl_tar:create/3.

$ tar -tf test.tar
foo.txt
bar.txt
baz.txt

And yes, tar can read that file and tells us that the three files we added are indeed part of the tar file.

erl_tar:create/3

Erlang also provides erl_tar:create/3 that takes a options list as it’s last argument.

We will create a new file, with the same contents, and pass in that we want this tar file to be compressed, and to be verbose with what it is doing as well.

erl_tar:create("options.tar.gz", 
               ["foo.txt", "bar.txt", "baz.txt"],
               [compressed, verbose]).
% a foo.txt
% a bar.txt
% a baz.txt
% ok

Again, let’s switch back to our OS shell, and inspect the resulting file.

$ tar -tf options.tar.gz
foo.txt
bar.txt
baz.txt

And let’s test it to see if it was considered compressed by gzip.

$ gzip --test options.tar.gz
$

And there we go, gzip considers this a compressed file with integrity. So let’s take a look at the size difference between the two tar files we created.

$ ls -l test.tar options.tar.gz
-rw-r--r--  1 -------  -----    154 Sep XX HH:MM options.tar.gz
-rw-r--r--  1 -------  -----  10240 Sep XX HH:MM test.tar

And looking at the filesize we can see that it is definately compressed, as options.tar.gz is two orders of magnitude smaller than test.tar.

Creating a file that already exists

As we just created test.tar and saw it had the contents, let’s see what happens when we call create on a file that already exists, by passing the same filename with a empty list of files.

erl_tar:create("test.tar", []).
% ok

And we take a look at the contents, we can see the original tar has been replaced.

$ tar -tf test.tar
$

This tells us that erl_tar:create/2 will create a tar file and overwrite the existing file, and doesn’t error out if the file already exists (assuming the user the shell is running has access to write to that file/directory).

Creating a tar for a path that doesn’t exist

If we give a bad path for a file, we can see that erl_tar:create/2 will return a error tuple, with the filename and reason for the failure.

erl_tar:create('/path/does/not/exist.tar', []).
% {error,{'/path/does/not/exist.tar',enoent}}

Other Potential Gotchas

First, the documentation states that it takes filename()s as arguments, but the documentation page for erl_tar does not specify on that page what a filename data type is.

If you use atom()s for the filename, you are going to get an error like the one below that I was getting at first, before using string()s for the filenames.

erl_tar:create('test.tar', ['foo.txt', 'bar.txt', 'baz.txt']).
** exception error: no function clause matching filename:join([]) (filename.erl, line 392)
     in function  erl_tar:split_filename/4 (erl_tar.erl, line 423)
     in call from erl_tar:create_header/3 (erl_tar.erl, line 352)
     in call from erl_tar:add1/4 (erl_tar.erl, line 305)
     in call from erl_tar:foreach_while_ok/2 (erl_tar.erl, line 940)
     in call from erl_tar:create/3 (erl_tar.erl, line 114)

Second, according to the Limitations section of the erl_tar documentation page, filenames should be less than 100 characters for maximum compatability across different systems and version of the tar program.

Lastly, it is on us the user to include the file extension when specifing the filename, as erl_tar:compress/2 does not manage the extension for us.

–Proctor