This Week on perl5-porters - 30 March-5 April 2008

grinder on 2008-04-11T19:44:00

This Week on perl5-porters - 30 March-5 April 2008

The extent that map/grep go to to keep the calling overhead of the block is horrendous and getting that to work for reduce in List::Util was difficult. Doing it with multiple blocks is going to be potentially very difficult. -- Graham Barr, not exaggerating how hard it is to work on the parser and optree generator.

Topics of Interest

Dual-lifing `Pod::Html`

Steffen Müller gave David Landgren a commit bit last week to take over the maintenance of Pod::Html. After looking around the blead directory tree, David wondered where the tests were. Jan Dubois pointed out their hiding place.

  mmm, hand-rolled test harnesses
  http://xrl.us/bi956

Lack of 5.10.x smoking

Dave Mitchell looked through the smoke results from March and saw less than half a dozen smokes for 5.10.1-tobe. This led him to ask if some of the regular smokers could schedule a smoke or two on a more regular frequency (especially after 5.8.9 is released).

Bram asked for some help on how to start smoking, such as what the most desirable combinations are for smoking. One important point to come out of the discussion was how useful ccache can be to cut down smoking time.

  chained smoking
  http://ccache.samba.org/
  http://xrl.us/bi958

Make built-in list functions continuous

Nicholas Clark noticed that one of the Google Summer of Code projects was to improve the performance of built-in functions, by getting them to skip the construction of intermediate lists. Nicholas wsa curious as to what was meant by this, since it isn't part of the current TODO list.

Wren Ng Thornton replied that it is an optimisation known as "deforestation" in Haskell parlance, and comes into play when you have a series of chained maps or greps, and a pipeline of SVs between each step. The answer to this is to use continuous functions, which is just a fancy way of saying that they operate on input and output streams.

Wren offered some rewriting strategies that he thought would speed things up. It all began to fall apart when Nicholas explained that during compilation there was never at any point a usable abstract syntax tree (or AST) that could be used as a basis for such manipulations, since the tokeniser and lexer emit what is more or less the final optree directly. Some additional obligatory fixups are then performed on the tree, as well as some peep-hole optimisations, but both of these operations are hopelessly intertwined. A distinct, pluggable optimiser for Perl 5 remains an elusive dream.

It gets worse. Nicholas said it took him a full time week's worth of work, just to create opcode optimisations for reverse sort @pig_pen and foreach (reverse @recusandae). It took him a day or so to remove the srefgen and ex-list ops from the creation of arrayrefs (like [1, 3, 7]) and hashrefs.

He wasn't sure how long it took for Dave Mitchell to teach the optimiser to perform in-place sorts for @schlip = sort @schlip, or for Yves Orton to achieve a faster if (%hash) {...}, but these are the only known examples of optree optimisations in the past three years. Dave admitted that it was "quite hard".

Dave explained that the naive approach of "look for a long string of ops and replace them by a shorter string" are hard to do and very fragile: they are either easily broken, or they break other things. And Rafael chipped in to say that it is difficult to write regression tests for them to boot.

Nicholas thought a better approach would be to get B::Generate and co. into the state where one could write optree rewriters in perl Perl and start to explore where the real wins lie. And it just so happens that Steffen Müller has been playing around with B and B::Utils to manipulate the optree and was beginning to make progress towards doing just that.

The other alternative that Nicholas came up with was to investigate Larry Wall's MAD work, which purportedly allows one to recover the original source after compilation (although I believe no-one has actually managed to achieve this in the general case).

  deforestation
  http://www.cse.unsw.edu.au/user/dons/papers/CSL06.html

  for de trees
  http://xrl.us/bi96a

Expose `ptr-table` funcs, add `ptr-table-delete`, and benchmark them

Jim Cromie wrote a patch to expose the underlying hashing mechanisms used by the internals, so that XS code could use it directly. He wasn't entirely convinced that it was wise to do so, but a factor of 5 speed-up was nothing to sneeze at. The fact that it might help Devel::Size caught Tels's attention, but he wasn't sure he understood what the patch offered.

  the street finds its own use for things
  http://xrl.us/bi96c

Stupid Transaction Idea?

Curtis "Ovid" Poe wanted to know if anyone had ever thought about using forks or threads to create a poor man's transactional memory. Robin Barker pointed to a talk made by Simon Wistow on the subject.

Mark-Jason Dominus made the connection between this question and a thread from June 2006 regarding reversible debugging. This revived the discussion about reversible debuggers and missile launches, until Abigail dragged things back on track, pointing out that rolling back transactions is a much simpler proposition than rolling the universe. For instance, a fire_missile() appears really to fire a missile, except that in reality it doesn't, not until the commit() is issued.

Paul Fenwick thought that if anyone was brave enough to pursue the idea, they could do worse than use a Safe compartment to ensure that no operations that could not be rolled back were performed.

  Simon says
  http://london.pm.org/lpw-2004/talks/simon_wistow-perl_voodoo.ppt

  the p5p thread
  http://xrl.us/bi96e

TodoTracker - get money for fixing TODO tests

Thomas Klausner and the Vienna.pm crew announced the grand opening of their TODO bounty hunter scheme, whereby people who write patches to solve TODO problems earn real money (that is, Euros).

What exactly is a TODO, and what it is worth is a work in progress, and you can find out more about it on their wiki:

  http://socialtext.useperl.at/woc/index.cgi?todo_test_bounties

  make money fast
  http://xrl.us/bi96g

Leopard has more standard `/etc/passwd` files than previous

Back in October 2007, Rafael Garcia-Suarez committed change #32200 to resolve a problem on an older OS/X. In newer OS/X versions, a file crucial to the test suite, nidump, is not longer available, and thus the test suite fails. Jan Dubois suggested that scraping the output of dscl might do the job instead. Unfortunately he lacked the tuits to do so.

Nicholas Clark said that Jan should refile it as a bug report so that it isn't left behind.

  http://xrl.us/bi96i

Unicode 5.1.0

The latest Unicode specification was released by UCD. Of particular interest was the inclusion of uppercase Uppercase ß (eszet). Tels made a cogent argument for the gradual disappearance of such characters: they are really fiddly to text via SMS. In any event, Perl now does 5.1.0, which is going to simplify the task of people who wish to write domino servers (the game, not the Lotus kind).

  http://xrl.us/bi96k

TODO of the week

(here, this should be an easy one).

`perlmodlib.PL` rewrite

Currently perlmodlib.PL needs to be run from a source directory where perl has been built, or some modules won't be found, and others will be skipped. Make it run from a clean perl source tree (so it's reproducible).

Patches of Interest

Double magic with `substr`

Vincent Pit had been sufficiently annoyed by magic in substr being triggered twice, when once was enough, that he sat down and crafted an elegant patch to fix it up. He had a couple of doubts about how to deal with the API change.

Nicholas explained how to resolve that by having the old implementation shuffle off to mathoms.c, and writing a macro that exposes the old name in terms of the new.

  old functions never die
  http://xrl.us/bi96n

  they just mathom
  http://xrl.us/bi96p

Double magic with '\&$x'

In his continuing quest to rid the core of twice-invoked magic, Vincent also delivered a patch to fix up the magic associated with \&$x. He knew there was another possibility of magic being triggered, but questioned the wisdom of invoking magic for something as tedious as creating an error message.

  a surfeit of magic
  http://xrl.us/bi96r

Make `PL_AMG_names` and `PL_AMG_namelens` static

Jan Dubois noticed that a couple of new symbols were being exported for 5.8.9-tobe. Since they really should be private, he made them static in blead. Steve Hay applied the patch, and tweaked regen.pl to get it to keep track of overload.c and overload.h.

Nicholas Clark thought that since 5.10 was out in the wild, it would not be possible for to hide them, since someone might already have discovered a way of using them, and thus removing their public visibility would cause such code to break (or at least, become unlinkable).

  http://xrl.us/bi96t

perlfunc.pod: `atan2(0,0)` returns 0, not `undef`

Paul Fenwick noticed a small error in the documentation concerning atan2(0,0), as the result of those arguments is undefined. Paul felt that perl should return undef, but in fact it returns 0.

Mark-Jason Dominus wondered if it would be better to have it throw an exception, like the logarithm of a negative number, or dividing by zero. Unfortunately that would be almost certain to break a lot of code in the wild. Paul felt that a warning would be sufficient, since people would be free to use Fatal and thus obtain an exception in due form.

Rafael Garcia-Suarez invited interested parties to look at the atan2 manpage on FreeBSD, which put forward some reasons why returning 0 can make sense.

Dave Mitchell then looked at the source and discovered that perl just returns whatever the underlying C library does. Andy Dougherty investigated further and determined that some platforms do indeed return 0 (as dictated by the C89 standard) and some will also set errno to EDOM.

Nicholas Clark was of the opinion that CORE::atan2 should return 0, and that leaves POSIX::atan2 free to call the underlying library.

  getting atan
  http://xrl.us/bi96v

New and old bugs from RT

possible fd bug in `PerlIOStdio_close` (#46173)

Last last year, Steve Peters outlined a scenario where duping a file descriptor during a close could cause a file descriptor to be leaked.

Nicholas Clark admitted this week that since Nick Ing-Simmons's passing, probably no-one understood how PerlIO works deep down. In any event, he thought the code as it stood appeared to be sufficiently wrong to merit a fix.

This it turn reminded Craig Berry to ask why PerlIOUnix_open hard-wires the opened file to 0666 wide-open permissions, and wondered why the code didn't honour the current umask setting. Dave Mitchell explained that the kernel took care of that.

  http://xrl.us/bi96x

`[[:print:]]` versus `\p{Print}` (#49302)

Given that no-one had been able to reconcile the differences between these two syntaxes (for example, that the former fails to match some things that the latter does), Robin Barker chose to document the differences.

  if you can't beat 'em
  http://xrl.us/bi96z

`utf8::valid` rejects characters in `\x14_FFFF - \x1F_FFFF` (#51710)

Steve Peters wondered whether the patch included in bug #43294 would fix this problem. Which it didn't, but that left him asking why \x14ffff was considered to be a valid character.

Chris Hall thought that it was but utf::valid was also happy with 0x000000 through 0x13ffff and 0x150000 through 0x7fffffff, which left him puzzled as to why utf::valid was singling out the 0x14xxxx range.

Chris wondered if the patch Steve was looking at was causing utf::valid to reject both 'ill-formed' byte sequences as well as 'non-characters'. Either way, it seemed to be sitting on the fence and not have a clear purpose.

After that I lost it a bit.

  we need a unicode-porters list
  http://xrl.us/bi963

  http://xrl.us/bi965

Segfault in `B::SVOP::sv` (#52284)

"Inferno" filed a bug which actually works correctly on a threaded perl, only non-threaded perls have problems. Reini Urban thought that the best solution was for B::Size to die a quick, painless death, and to use Devel::Size instead, as it is so much nicer.

  bug in march, answer in april
  http://xrl.us/bi967
  http://xrl.us/bi969

Attempt to free temp prematurely (perl 5.8.8) (#52386)

Frank v Waveren reported a bug in 5.8.8 that Nicholas Clark determined had been fixed in 5.8.9 to be, although he didn't know off-hand what change was responsible for the fix. Frank tracked it down via the git repository, and identified change #30166 as being the fix.

  http://xrl.us/bi97b

`lc`/`uc` have unexpected side effects inside for loop (#52412)

Mike Wver discovered that the following snippet

  my $foo = 'A';
  for my $bar (uc($foo)) {
    my $lower_bar = lc $bar;
    print "$foo $bar\n"; # $bar should still be 'A'
  }

prints A a instead of A A. No-one knew why, but Abigail pointed out that it was fixed in 5.10.0.

  http://xrl.us/bi97d

`map` isn't context aware in some cases (#52452)

Stefan Wehinger wondered why slightly different nested map constructs use some, a lot, or all available memory. David Nicol made a decent stab at explaining it in terms of lists being reclaimed sufficiently early or not.

Nicholas Clark suggested that the desired behaviour described in the report can be achieved, along with a sane level of memory consumption, by rewriting the loops with foreach instead of map.

  http://xrl.us/bi97f

Perl5 Bug Summary

  1807 (+7 -3)
  http://xrl.us/bi97h
  http://rt.perl.org/rt3/NoAuth/perl5/Overview.html

New Core Modules

Math::BigInt 1.88

Tels announced the release of a brand new Math::BigInt, along with an updated bignum pragma, Math::BigInt::FastCalc and Math::BigRat. This release closes out nearly all the existing bugs, only two remain, at the bottom of the barrel. In the meantime, Tels is sitting back and waiting to see what the CPAN Testers make of them.

  http://xrl.us/bi97j

In Brief

Tels wondered if Reini Urban had had time to check out his patch for Devel::Size and bleadperl, but Reini was moving house this week.

  http://xrl.us/bi97m

Robin Barker's verbosity tweaks to regen.pl and friends made it in.

  http://xrl.us/bi97o

Jan Dubois felt that PL_bincompat_opt should be exported on AIX and Windows. Steve Hay thought so too, but realised that Jan was really talking about PL_bincompat_options. Applied.

  http://xrl.us/bi97q

Jarkko Hietaniemi got H.Merijn Brand to tweak Configure in order to align floating point policies of gcc and cc on Tru64.

  http://xrl.us/bi97s

Jan Dubois thought that change #23984 should be integrated into 5.8.x, as it gets corelist installed on Win32. Nicholas Clark said that it was already in, the reason being that it help perlbug go about its business.

  http://xrl.us/bi97u

Andreas König warned that lib/CGI/t/upload_post_text.txt was checked in as binary and wanted to know if it be changed. Rafael said that it was binary for a reason; it was in fact a GIF file.

  and patent-free
  http://xrl.us/bi97w

Jerry D. Hedden ran into trouble with the above file, and Nicholas Clark straightened things out.

  all packed up
  http://xrl.us/bi97y

Paul Fenwick issued an RFC for Fatal/autodie exception handling naming and structures.

  http://xrl.us/bi972

Last week's summary

Tels clarified a point regarding the use of POD for wiki markup, explaining that his MediaWiki-Pod distribution on CPAN was a subclass of Pod::Simple::HTML that fixes up a lot of the problems that people encounter when using Pod::Simple::HTML.

  This Week on perl5-porters - 23-29 March 2008
  http://xrl.us/bi974

About this summary

This summary was written by David Landgren.

Weekly summaries are published on http://use.perl.org/ and posted on a mailing list, (subscription: perl5-summary-subscribe@perl.org ). The archive is at http://dev.perl.org/perl5/list-summaries/ . Corrections and comments are welcome.

If you found this summary useful, please consider contributing to the Perl Foundation or attending a YAPC to help support the development of Perl.

Corrections

wren ng thornton on 2008-04-12T01:38:58

The name's "wren ng thornton" I'm not sure where that ancient alias crept in from...

(And it's not just haskell parlance, it's common parlance for optimization in general :)

Sorry about that

grinder on 2008-04-16T07:03:30

Damn. Please accept my sincere apologies. I'm pretty sure I just cut'n'pasted your name from the e-mail message I was reading. Fixed.

I can now type Jarkko Hietaniemi and Aristotle Pagaltzis without a second thought, but I don't trust myself with much else :)