This Week on perl5-porters (23-29 June 2003)

rafael on 2003-06-30T21:42:45

This week's p5p summary is going to be a bit unusual : a few very long threads will be summarized (logically) in longer paragraphs. Read about hashing algorithm vulnerabilities, new proposed syntax, CHECK and INIT blocks, and other unlittle things.

Algorithmic Complexity Attack, continued

The discussion that started last week about the possibility of algorithmic attacks on perl hashes (bug #22371) found its conclusion. Jarkko Hietaniemi proposed several successive iterations of a patch.

Here's the final state : unless the -DUSE_HASH_SEED_EXPLICIT or -DNO_HASH_SEED preprocessor symbols are given to the Configure script, perl's internal hashing algorithm will use a random start value (called the hash seed). This value can be overriden via the PERL_HASH_SEED environment variable. That means that if you have a program that uses a hash, with a constant set of keys, inserted in the same order, the ordering of the output of keys() will different between different executions of your program.

At some point Nick Ing-Simmons suggested to introduce a new attribute to declare ``repeatable'' hashes :

    my %hash : predictable;

or

    my %hash : lock_order;

Jarkko made a failed attempt at implementing it, but as anyway people seemed to dislike it, or to not like it enough (finding it overkill), it wasn't kept.

The thread starts at http://xrl.us/j7m -- and it counts more than 100 messages.

Now what would be interesting is to have some benchmark results, on different platforms. Using SpamAssassin (per Nicholas Clark's suggestion) seems to be a good idea.

Hash mysteries

Jarkko finally applied his randomized hash seed patch, and two failures appeared randomly in the smoke test reports. The first one was caused by a piece of code that used to iterate wrongly over the keys of readonly hashes. The second one, in a test for Encode::Alias, was more difficult to hunt down. It was caused by an inconsistent definition of an encoding name alias, making the test effectively dependent on hash key ordering.

My slice

Jarkko proposed a new syntax : the ability to declare a hash and initialize a slice of it at the same time :

    my @hash{@keys} = (values...);

Mark-Jason Dominus approves, but wonders why my($h{$k}) shouldn't be accepted either. Rafael Garcia-Suarez proposes a one-line proof-of-concept patch to allow the syntax my(@hash{qw/keys/}) (with parentheses -- otherwise modifying the parser is necessary). Hugo van der Sanden worries that this may mislead users in expecting that local @hash{@keys} works analogously, by emptying %hash first, and thus pronounces himself against this idea.

    http://xrl.us/ks4 

CHECK and INIT blocks

Marc Lehmann reports (bug filed under number 22826) that neither CHECK nor INIT blocks are executed when require() is used to load a module. Rafael provides the explanation (and sheds some light on the unclear points of the documentation) : CHECK and INIT blocks are executed during the transition between the compilation phase of the main program and its execution. Thus, they're executed at only one point during the lifetime of the perl interpreter, and once the main program has been compiled, they're never executed again. (This explains the existence of the warning Too late to run CHECK/INIT block.)

Rafael points out that the B compiler suite actually relies on the current behaviour, and that it seems to be well adapted to other profiling or analysis tools, such as Devel::Cover. Gurusamy Sarathy recalls that Larry wished the CHECK/INIT blocks be run once per optree. Apparently that would be a job for a new kind of special block.

The state of the equivalents of CHECK, INIT, BEGIN and END blocks in Perl 6 seems to be mostly unclear to the P5P crowd, although Arthur Bergman crafted a module that provides PRE and POST to Perl 5 (Hook::Scope).

    http://xrl.us/ks5
    http://xrl.us/ks6 

Perldoc and Cygwin

Gerrit P. Haase sent a patch for Pod::Perldoc on Cygwin, to disable the default use of man(1) to format output, because man is apparently broken. Perhaps something to do with groff. Merijn applied the patch. John Peacock found a workaround : setting the environment variable LESS=-R (or --RAW-CONTROL-CHARS) instead.

    http://xrl.us/ks7 

In brief

Johan Vromans posted a RFC to choose a default way to handle multiple valued options in Getopt::Long. Nobody commented.

    http://xrl.us/ks8 

Alan Burlison had a look at the build process (on Solaris), and he notices that perl is linked against libraries it doesn't actually use. He's investigating it.

    http://xrl.us/ks9 

Gurusamy Sarathy and Ed Avis talked about releasing Data::Dumper on CPAN. This raises the question of the test suite, that should work over all perl versions, which implies not relying on hash key order. But Jarkko did just fix that test suite to enable the Sortkeys option to Data::Dumper (related to his hash seed patches). Brian Ingerson points out that YAML might be a better tool than Data::Dumper for a large number of uses of Data::Dumper.

    http://xrl.us/kta 

A new Perl Repository Browser has been put on line on ActiveState's side. The URL hasn't changed.

About this summary

This summary was brought to you by Rafael Garcia-Suarez. Weekly summaries are available on http://use.perl.org/ and/or via a mailing list, which subscription address is perl5-summary-subscribe@perl.org . Comments, additions, multiplications, corrections, and suggestions are welcome.


my %hash : predictable

nicholas on 2003-06-30T22:28:28

anyway people seemed to dislike it, or to not like it enough

Maybe I'm biased, but I thought that the most important reason for dropping it was that it simply didn't work

Re:my %hash : predictable

rafael on 2003-06-30T22:45:03

Things that don't work might, eventually, be made to work. Michael Schwern had stronger arguments than you.

Re:my %hash : predictable

nicholas on 2003-06-30T22:54:40

True. I was more thinking about the particular implemenation, and I did suggest ways to kludge it into working. And Schwern's arguments were independant of the implementation, and unanswerable.