This Week on perl5-porters (2-8 October 2006)

grinder on 2006-10-12T12:16:00

"I do not know what crazy things people have done in code I cannot see. But people are creative in strange ways, and I cause them enough trouble already by incorporating changes with unintended behaviour changes." -- Nicholas Clark, on why it pays to be cautious.

Topics of Interest

Outdated modules in blead

Andreas König noted that there were eight modules in blead that had more recent counterparts on CPAN, mentioning that one of them, Text::Soundex, was intentionally neglected.

The reason for the neglect is that nobody cares about it any more, but since it is in core, it cannot be removed. Those people who do need it know enough to go to their local CPAN mirror and pick up the speedier XS-based version. H.Merijn Brand thought that some effort should nonetheless be made to keep blead and CPAN synchronised.

Steve Peters brought the most of listed modules up to date.

  http://xrl.us/r7u3 

Upgrade (or not) to threads-shared 1.03

Dave Mitchell went ballistic over the application of this patch, due to its gratuitous whitespace changes, as this makes it harder to see what the the true changes are. Similarly, people chasing down bugs in the future will stumble each time they have to traverse it.

  Stop fiddling with the bloody whitespace
  http://xrl.us/r7u4 

Overloading thread stringification

Jerry D. Hedden asked what the porters thought of overloading "" in the threads module to return the thread identifier, since it would make some constructs prettier.

John Peacock and Nicholas Clark didn't like the idea, as it seem to admit the possibility of breaking programs that relied on the existing behaviour (getting back a string like threads=SCALAR(0x209e270c)).

Jerry, still not quite getting it, asked if that meant that the possibility of incompatibility ruled out any chance of improvement. So he floated the idea of an addition on the CPAN module that would never be backported to maint, but that would be akin to a code fork.

Gisle Aas disliked it as well. He had been down this track many years ago, and has since come to the conclusion that this sort of "feature" eventually winds up causing more problems that the ones it was supposed to solve.

Johnathon Stowe summed up the problem neatly, and it is this: no one can guarantee that there is no code anywhere that does something like

  if ("$thread" =~ /^threads/)

regardless of whether this is a desirable way to test for an object being a thread. With that in mind (and this is really what the backwards compatibility game is all about), there was little chance that the patch would be applied. Since behaviour is being changed, regardless of what merit or value it has, new behaviour can only be activated by asking for it.

  http://xrl.us/r7u5 

Despite the arguments against the proposal, Jerry went ahead and made the change. Rafaël declined to integrate it into blead. Jerry finally admitted defeat, and said he'd add an import() verb to activate the change.

  http://xrl.us/r7u6 

Which in fact, he did. Applied as change #28958.

  And they all lived happily ever after, kthnx
  http://xrl.us/r7u7 

regmatch() restructuring complete

Dave "Leakbane" Mitchell announced that his restructuring work on regmatch() was now complete. Now no more recursion fakery is needed, all backtracking is now done via a single backtracking state stack. Dave was quite pleased to see that the amount of state required had been shrunk from 72 bytes down to 44 bytes (on 32-bit systems).

Dave posted the large comment that explains the matching strategy for we other mere mortals to ponder. Yves Orton was pleased as Punch, since he realised this would simplify the things he wants to implement.

  Affairs of state
  http://xrl.us/r7u8 

patch 28900 breaks libwww

Andreas König noticed that change #28900 causes HTTP::Daemon to fail, specifically, a slice of stat _ gets garbled.

The odd thing was that change #28900 was one of Yves's patches for the regular expression engine. Neither Yves nor Rafaël Garcia-Suarez could figure out why that would be, but Andreas was certain, postulating some weird side-effect.

It turns out to be a poor assumption in the HTTP::Daemon code, which assumed that _ (the underscore filehandle) is lexically scoped (which in fact, it never was). So if you call a subroutine and then access it afterwards, it's possible that the subroutine will have used it for its own purpose. And that's exactly what happened here.

Yves was relieved, but thought about what he could do to minimise the impact in the future.

  Action at a distance
  http://xrl.us/r7u9 

Locale-dependent testing

John Peacock reported having worked on version objects in the presence of locales, where commas may be used for decimal points instead of the anglo-saxon period. The code was easy, (since Rafaël had done most of the heavy lifting), but John was wondering how to test it. He asked for a show of hands (off-list) to see who had the 'de_DE' locale present on their machine.

H.Merijn Brand noted that he tended to delete all non-crucial locales from his machines, since localised software tended to enjoy installing their documentation in all the locales on the machine for which it had a corresponding version. This can lead to rampant disk space consumption, and even require reboots, should the software in question be kernel-related.

Jarkko Hietaniemi suggested setting up a list of all the locales in the world that used non-period decimal points, and iterating over them until one pops up on the machine running the tests. John already had similar code that does this, so he promised to whip up a development version of version, and get people to smoke it for a while and see what comes out.

  http://xrl.us/r7va 

Data::Dumper might be faster now

Nicholas Clark tweaked Data::Dumper slightly to reduce the amount of realloc()s being performed. The results were quite surprising. Andy Lester acquired the necessary commit bits to update the CPAN version.

  Dumpin' goodness
  http://xrl.us/r7vb 


Patches of Interest

cflags.SH: rethink of the gcc -std=c89 and -pedantic

Jarkko and H.Merijn continued to work on this, getting to the stage where it was good enough to be applied.

  http://xrl.us/r7vc 

Named captures in regular expressions

Yves Orton delivered a first cut at implementing named captures in Perl. This allows the following:

  'falange' =~ /(?<end>....)$/ and print $+{end} # prints "ange"

This comes in very handy when you have lots of nested capturing groups, since you no longer have to count parentheses, nor do you have to renumber your $1, $2, $3, ... when you insert a new capturing group somewhere in the middle.

H.Merijn Brand thought that this was so cool that it was nearly enough all by itself as being sufficient reason to upgrade Perl.

The thread became sidetracked into improvements for Johan Vroman's makepatch program. After a number of suggestions from Yves and Dr. Ruud, Johan promised to release a new version.

  http://xrl.us/r7vd 

Yves merged his and Dr. Ruud's patch into a single metapatch for makepatch.

  http://xrl.us/r7ve 


Watching the smoke signals

Smoke [5.9.4] 28908 FAIL(XF) linux 2.6.17-1.2187_FC5 [fedora] (i686/1 cpu)

Steve Peters smoked a build with a recent gcc compiler and gouts of oily black smoke poured out of the vents. Jarkko interpreted this as meaning the gcc has become more aggressive in its ANSI conformance checks. That's the good news. The bad news is that the code base uses the non-ANSI long long datatype, for 64 bits of goodness. Jarkko was hoping that there was some way to placate gcc over this minor transgression.

  Maybe if we slaughtered a chicken
  http://xrl.us/r7vf 


New and old bugs from RT

perl v5.7.0 +DEVEL8481 on i86pc-solaris-64int 2.9 (#5281)

Steve Peters reasoned that this bug had been fixed, but looking through the patches, could not determine which one was responsible.

  http://xrl.us/r7vg 

read, fork and exit mismatch file positions (#5999)

The Solaris man page says it all: if you do I/O in a child process with the file descriptors inherited from the parent, you have to be very careful how the child exits, otherwise strange surprises await you.

Aaron Sherman wondered if there was a way of squirrelling the necessary information away somewhere, so that the child would do the right thing in this event. Steve Peters cautioned that no one had sat down and worked through all the delicate intricacies of fork-related issues, especially when one considers threads at the same time.

  http://xrl.us/r7vh 

fairly large regex optimization bug with 5.7.3 (#8835)

Steve then returned to a regexp bug filed by Jeffrey Friedl, no less, and noted that the poor observed performance had been corrected due to Yves Orton's trie work.

Yves noted that this was indeed the case, yet Jeffrey's analysis of the problem was indeed correct. There are a number of ways in which an alternation can be dealt with by the engine, and these work more or less efficiently depending on the pattern and the target (a bit like an SQL query planner). It turns out that when non-capturing parentheses (?:...) were added to the engine, the optimiser started making bad decisions, such as the one noted by Jeffrey.

At first, Yves couldn't see an easy way to make this work without introducing additional overhead. Hugo van der Sanden thought that it might be worth examining end anchors and finite maximum match widths as well. He thought that a number of anchor-related optimisations have broken since the \A, \Z and \z anchors were introduced, and by the same token that other new optimisations are now possible.

Yves then discovered an cheap, fast, reliable solution to the problem.

  Pick all three
  http://xrl.us/r7vi 

File::Find has issues with symlinks (#40417)

Rafaël tried out the supplied patch for this problem, but noted that it broke three tests in blead's regression tests. He asked for an updated patch (with more tests) to take this into account.

  http://xrl.us/r7vj 

Halting perl with control-C sometimes causes crash on windows (#40445)

Alex Davies filed a bug on the problem of Control-C causing core dumps on Windows. No takers for the moment, but Yves Orton mentioned this behaviour recently.

  http://xrl.us/r7vk 

Possible pointer corruption? (#40450)

Benjamin Carter showed how a small script with redo, eval and @_ would crash regularly. Dave Mitchell believes the underlying problem was fixed by change #24384.

  next and redo didn't restore PL_curcop
  http://xrl.us/r7vm 

Installation of perl 5.8.8 on Linux RH 9 (#40453)

Tasnim was running into problems building perl. Andy Dougherty suspected that there might be a problem with the config.sh file, and suggested that Tasnim delete the source tree and start afresh.

  http://xrl.us/r7vn 

Tasnim followed up with another bug report (#40454), but the porters didn't hear back from him after their suggestions were given

  So maybe it works
  http://xrl.us/r7vo 

Memory leak with MULTICALL (#40469)

Tassilo von Parseval found a memory leak, and thought that MULTICALL was responsible. No takers.

  http://xrl.us/r7vp 

glob misbehaviour regarding special characters and spaces (#40470)

David Serrano found what he thought was a problem in glob with files or directories that contain spaces. No takers, possibly because everyone hates files and directories that contain spaces.

  Curse those Redmond boys
  http://xrl.us/r7vq 

Perl5 Bug Summary

  8 more this week
  http://xrl.us/r7vr 
  http://rt.perl.org/rt3/NoAuth/perl5/Overview.html 


In Brief

Terry Glanfield demonstrated a nifty bug in Storable to show that Storable::freeze([$r, $r]) doesn't work. Adam Kennedy asked him to open a ticket on Storable's CPAN RT queue.

  http://xrl.us/r7vs 

Steve Hay noticed that threads tests were hanging in Win32 smokes again, and wondered if changes #28922 and #28923 had anything to do with it.

  http://xrl.us/r7vt 

Dave Mitchell explained to Dave Rolsky that the 5.8 code for closures has a number of subtle bugs, and will probably always remain that way.

  But wait until 5.10
  http://xrl.us/r7vu 

Paul Green updated the Stratus VOS files.

  http://xrl.us/r7vv 

Now that it has been ANSIfied, the zlib library can be build with g++ .

  http://xrl.us/r7vw 

Alan Olsen pointed to a couple of threads on Perl Monks that dealt with Exporter problems, suggesting that the problems raised there, and the attendant solutions, should be included on the Exporter documentation.

  http://xrl.us/r7vx 

Paul Marquess added some C++ goodness to List::Util .

  http://xrl.us/r7vy 

And Jarkko continued his own peculiar brand of compiler gymnastics.

  http://xrl.us/r7vz 

The One Laptop Per Child project no longer requires Perl. When you only have 500Mb of Flash memory to play with...

  ... every byte counts
  http://xrl.us/r7v2 

About this summary

This summary was written by David Landgren.

Weekly summaries are published on http://use.perl.org/ and posted on a mailing list, (subscription: perl5-summary-subscribe@perl.org ). The archive is at http://dev.perl.org/perl5/list-summaries/ . Corrections and comments are welcome.

If you found this summary useful, please consider contributing to the Perl Foundation to help support the development of Perl.


On whitespace changes

nik on 2006-10-12T15:06:53

While I agree that trying to avoid whitespace changes at the same time as functional changes is good practice, and to be encouraged, note that all is not lost if a change includes whitespace changes. Various diff implementations include an "--ignore-whitespace" or equivalent option that can be used in just this situation.

Re:On whitespace changes

kwilliams on 2006-10-12T23:30:03

Or, if the changelist number that did the reformatting is known, just skip over that changelist number when doing diffs. That helps when the formatting changes cross line boundaries (like moving braces to the next line, etc.) and the line-oriented diff algorithms are still too chatty.