RabbitFarm

2019-05-09

Perl Weekly Challenge 007

As with the previous weekly challenges the problem statements are short and are included in the first comment of the code. The code blocks shown link to GitHub Gists.

Part 1

I think this is straightforward enough, with the exception of the use of pack and unpack.

Sample Run

$ perl ch-1.pl
1 2 3 4 5 6 7 8 9 10 12 18 20 21 24 27 30 36 40 42 45 48 50 54 60 63 70 72 80 81 84 90 100 102 108 110 111 112 114 117 120 126 132 133 135 140 144 150 152 153

pack/unpack

The line unpack("%32C*", pack("C*", @digits)) is a bit mysterious looking at first so let's unpack it (pun intended!) .

We are first, in the inner pack() call, creating a binary representation of the @digits array according to a template.

The 'C' means to break the input into unsigned chars
the '*' means repeat the 'C' for the rest of the input.

The template "C*", in other words, means "treat every item in this array as an unsigned char" and then pack them together. The result is a binary representation which, to be quite honest, means just about nothing unless you know, or are able to guess, the template.

So then we use unpack()'s checksum function to sum the digits, which is really all we are after here anyway.

The '%' is just telling unpack to use the special checksum function
The 'C' means to break the input into unsigned chars (because that is how they were packed to begin with)
the '*' means repeat the 'C' for the rest of the input (because that is how they were packed to begin with)
The 32 means to compute a 32-bit checksum

The two functions share the same template syntax which is documented on the pack manpage.

More on binary formats

If you are curious just what the raw binary created by pack() looks like you can use the xxd command with the -b option. For example, suppose I ran a command like the following:

perl -e '$i="2019";@d=split(//,$i); print pack("C*", @d);' > /tmp/pack

can you guess what A is based on this output from xxd?

$ xxd -b /tmp/pack
00000000: 00000010 00000000 00000001 00001001 ....

If you find this figuring out of the meaning behind binary strings interesting than perhaps you have a future as a reverse engineer! If not, than you are a normal well adjusted person. :D

Part 2

Sample Run

$ perl ch-2.pl
cold -> cord -> card -> ward -> warm

Overview

To compute a word ladder we can calculate the shortest path between the two words, as represented by vertices, on a graph. To create the graph sub build_graph() creates a vertex in the graph for each word and does a pairwise comparison between all words. If they differ only by one letter then an edge is drawn between the two vertices. Once the graph is created we perform a Dijkstra Single Source Shortest Path computation which will compute all shortest paths from a given source vertex to all other vertices. This may seem wasteful, we are only concerned with one such path after all, right? Interestingly, the worst case time complexity is the same whether we abort the algorithm after finding the path we want or keep going and compute them all. There is some wasted calculation of course but, asymptotically speaking, we don't need to be too anxious about it.

The code shown above gets a little dense in place although I think at a high level the approach is not too hard to comprehend.

Create the graph. I use the Graph module from CPAN.
Select a source word (vertex). Compute all shortest paths from this source.
return the path we are interested in for our word ladder.

Details

As mentioned before some parts of the code get a bit dense. This is not out of some intent to be purposefully obfuscated! As always with Perl There Is More Than One Way to Do It and the way I chose fits a balance between readability and still not being overly verbose. I think.

In sub build_graph these lines

my $length_w0 = do{
$w0 =~ tr/[a-z]//;
};

compute the string length of $w0 by using the return value from tr which is the number of matches made. The same technique is used a few lines down with

my $differences = eval "\$w1 =~ tr/$w0//";

but here we need to wrap the tr in an eval because tr does not perform any variable interpolation since its transliteration table is built at compile time and not run time.

In sub dijkstra_sssp the lines

local *by_total_edges = sub {
   return 1 if $total_edges{$a} eq OO;
   return -1 if $total_edges{$b} eq OO;
   return $total_edges{$a} <=> $total_edges{$b};
};

define a nested subroutine. The need for the use of local is to avoid creating a closure. The perlref manpage describes the situation: ... named subroutines do not nest properly and should only be declared in the main package scope. This is because named subroutines are created at compile time so their lexical variables get assigned to the parent lexicals from the first execution of the parent block. If a parent scope is entered a second time, its lexicals are created again, while the nested subs still reference the old ones.

Anonymous subroutines get to capture each time you execute the sub operator, as they are created on the fly. If you are accustomed to using nested subroutines in other programming languages with their own private variables, you'll have to work at it a bit in Perl. The intuitive coding of this type of thing incurs mysterious warnings about "will not stay shared" due to the reasons explained above.

So by using local what we are doing is creating a temporary (i.e. re-created for each call to the enclosing subroutine) assignment of the anonymous subroutine. This has normal access to the lexical variables from the enclosing scope at the time it is is invoked.

This has the interesting effect of creating a function local to another function, something not normally supported in Perl. I'll freely admit this is definitely not idiomatic Perl. I'd argue that stylistically, in any language, this should be the preferred way of organizing this code. The small function used for sorting vertices has no use outside of sub dijkstra_sssp. Indeed I could imagine many programmers just inlining the code in the call to sort! This use of local is essentially the same thing, but cleaner looking than inlining. The only complication is that perl will complain with

Name "main::by_total_edges" used only once: possible typo at perl5/ch-2.pl line 41.

unless we add the line no warnings "once"; I am still investigating this. At face value the warning makes no sense: by_total_edges is called from within a loop. Is this a slight defect in the interpreter to not recognize this? I'll write up what I learn about this another time.

Notes

If you thought the use of the Unicode character '∞' was unnecessary you are correct! I was hoping that it would be an acceptable identifier and that line could have read use constant ∞ => "INF"; or something like that where I could then have used ∞ in the code almost like a numeric literal. Sadly this is not an acceptable identifier. (Although I think it should be!)
The Graph module has an interesting history, it was originally written as part of the code samples for the book Mastering Algorithms with Perl. That book has served as an excellent reference for many years.

posted at: 20:05 by: Adam Russell | path: /perl | permanent link to this entry