This Week on perl5-porters - 9-15 October 2006

grinder on 2006-10-19T12:29:00

"May I remind everybody my personal opinion, which is that I see little to null utility to style patches to C or perl code, but that they clobber version control history, waste my time, and that in most I cases will not apply them." — Rafaël Garcia-Suarez, current pumpking, on why people should not fiddle with the bloody whitespace.

Topics of Interest

Coding styles and ancient perls

Last week, Dr. Ruud and Yves Orton worked on bringing makepatch up to speed on the Windows platform. For some reason it sparked a long discussion about the merits of writing code compatible with ancient versions of Perl and $_ topicalisation.

  http://xrl.us/sm5z 

5.8.8+ invalid free() in Perl_do_exec3

Alexey Tourbin spotted a problem with the current 5.8 snapshot that doesn't show up in the 5.8.8 release. Dave Mitchell thanked him for picking it up, and fixed it.

Alexey went on to mention that he runs a git repository that tracks the 5.8 and 5.9 sources, and lamented that he could not quite figure out to which change corresponded the 5.9.0 release.

  ah say git down
  http://xrl.us/sm52 

Resolving downstream breakages from version.pm changes

Adam Kennedy uses only.pm to lock down the versions of modules used in the code he delivers to clients, and noticed that only doesn't work correctly with changes that have taken place since in version, and development on only appears to have dragged to a halt.

John Peacock took a look at only's code, saw what was wrong with it, and at the same time asked Adam wither version::Limit might do the trick instead.

  http://xrl.us/sm53 

g++ compile and make test 100%

After weeks of patching and tweaking the source, Jarkko Hietaniemi announced that he had succeeded in performing a C++ compilation of the perl source, and a completely successful test suite run. Rafaël Garcia-Suarez noted that NDBM_File still had trouble. Jarkko didn't notice that because it wasn't configured to be built on his platform. Robin Barker provided a patch to fix it up.

Paul Marquess, Steve Peters and Jesse Vincent reported successes with other C++ compilers and platforms. H.Merijn Brand found a problem on Suse Linux, as did Dominic Dunlop on Mac OS/X.

  http://xrl.us/sm54 
  Robin's patches
  http://xrl.us/sm55
  http://xrl.us/sm56
  http://xrl.us/sm57 

The curious case of the last close parenthesis

Yves Orton surfaced briefly to discuss some oddities he had found in the regular expression concerning how opening and closing capturing parentheses are managed. An optimisation in the engine consists of throwing the regop codes that denote the parentheses, and tracks them itself... but doesn't track them all that well.

Over time, this has manifested itself through a variety of bugs, all of which have been solved in ad hoc ways, as no one really understood the big picture.

Yves thinks that his hypothesis is right, and wanted to know what Dave Mitchell thought of it. Ilya Zakharevich remembered that Yves's two variables, PL_reglastparen and PL_reglastcloseparen were once upon a time a single variable, but were split into two, probably when non-capturing parentheses were introduced. As such, it's quite possible that the code was not correctly updated in all the required places. In a patch elsewhere this week, Yves dropped some comment markers here and there, where he thought the variables should be updated.


  http://xrl.us/sm58 

Will the real Perl_reg_named_buff_sv please stand up?

Craig A. Berry was having trouble building the re extension on VMS, since the VMS linker didn't like the fact that the Perl_reg_named_buff_sv symbol was defined twice, in two different object files.

Yves explained that one was derived from the other, in order to generate the debugging version. His linker, being slightly more lax, did not complain about this issue, and fixed it up in a subsequent patch.

  http://xrl.us/sm59 

Status of Perl 5.10 TODO list

Yves then looked at the TODO list for 5.10 and counted four issues outstanding. He wanted to know that their status was, and what needs to be done. Rafaël said three of them were issues he planned to deal with, and one was less important (lexical encoding pragmas). He also mentioned that aliases were probably important enough to be taken on board, as were UNITCHECK blocks.

Juerd was very keen to see work done on the encoding pragma. He as released a work-around encoding::split to get things to work to his satisfaction, but would like to see the functionality appear in the core.

Adriano Ferreira reminded the porters of bug #38945, which notes that Configure is not x.10 version aware. Andreas König mentioned that DBI doesn't compile on blead (and someone needs to open a ticket on that).

  http://xrl.us/sm6a 

Philip M. Gollucci, almost left behind, pointed out that there was an icky attribute/printf problem that needed to be sorted out as well.

  http://xrl.us/sm6b 

SVpvs vs SVpvn

Jim Cromie was looking through the source when he noticed that there were a number of places where the newSVpvn call could be more profitably replaced by a newSVpvs, the reason being that this removes the need to keep hard-coded lengths of strings lying around (as this can allow bit-rot to creep in).

Nicholas Clark went ahead and made the change. Aaron Sherman pointed out that the hard-coded length might be a slightly sick way of taking a substring of a constant, such as

  SV *s = newSVpvn("pear",3);

Nicholas had already looked for such cases in the code, and the only occurrences he found were deliberate tests.

Vadim Konovalov wondered whether the real reason that hard-coded lengths were used was to avoid possibly costly length computations. Jim Cromie believed that this was not a problem, since the constructs tended to be evaluated when the C source is compiled, and boil down to a constant in the resulting object code.

  http://xrl.us/sm6c 

New version diagnostic breaks a bunch of modules

Michael G. Schwern wished to register his disgruntledness at the fact that a diagnostic change caused a number of his modules (Exporter::Lite, UNIVERSAL::exports and UNIVERSAL::require for starters), and wondered how many other modules were in the same boat.

John Peacock was loathe to give up the more informative new error message. He could get close to matching the old message if certain conditions were met, but not completely. Michael remarked that future-proofing the checking of perl error messages is hard.

  Never mind localisation
  http://xrl.us/sm6d 

Off by one in the trie code?

Nicholas Clark wrote a psychotic regular expression to provoke a leak that was picked up by valgrind, and he wondered if there was an off-by-one error. Yves promised to have a look at it later.

  http://xrl.us/sm6e 


Patches of Interest

What Jarkko Hietaniemi did this week

Jarkko tweaked some of Yves's code to get it to compile cleanly with the more fussy compilers that he has at his disposal.

  Do not cross the initialisation boundary, do not collect $200
  http://xrl.us/sm6f 

Jarkko transformed Digest::SHA from K&R C to ANSI, and described the delicate footwork required to ensure that blead and CPAN play nicely together.

Mark Shelor, Digest::SHA's author, thanked Jarkko for the patch, and planned to integrate it for the upcoming 5.44 release, so with luck no one will have to take extra care for very long.

  http://xrl.us/sm6g 

Jarkko performed similar tweaks for Encode.xs, with a eye on C++.

  http://xrl.us/sm6h
  http://xrl.us/sm6i 

He then gave a few hints to Linux and Solaris to tell C++ how to deal with dlerror, noting that if a third platform needs this trickery then it will be time to push it off into Configure.

  http://xrl.us/sm6j 

Moving on through the modules, he then performed a classy act on ExtUtils::ParseXS.

  http://xrl.us/sm6k 

And then ANSIfied the Text::Soundex module.

  A530 T500 A521 T000 T232 M340
  http://xrl.us/sm6m 

And an EXTERN_C wrapper for something in re.pm


  http://xrl.us/sm6n 

What Yves Orton did to the regular expression engine this week

Yves Orton, the unstoppable feature-adder to the regular expression engine, added named recursion this week, whereby <(?&NAME)> recurses to the capture buffer defined by (?<NAME>...).

  No, my brane hurts
  http://xrl.us/sm6o 

He landed another patch to fix bugs, test code and add more functionality to deal with conditionals in regular expressions.

  http://xrl.us/sm6p 

He then added even more documentation, of note is an update to perlreguts.

  http://xrl.us/sm6q 

And last but not least, Yves announced that he had added possessive quantifiers to the engine. He also put floated the idea of removing the dire warnings about the experimental status of the existing extensions, since they have been around for a long time and have been well documented and well explored by large numbers of people.

  http://xrl.us/sm6r 

Sadahiro-san delivered a patch to remove leaveit from toke.c:scan_const, which deals with escaped characters in patterns. There were a number of short-comings with the current code, not to mention breakage. Now it's better.

  http://xrl.us/sm6s 


New and old bugs from RT

sprintf width+precision fails on wide chars (#40473)

Anatoly Vorobey found a bug in the way sprintf handles UTF-8 characters and supplied a patch against blead to correct it. Applied by Rafaël.

  Why aren't all bug reports like this?
  http://xrl.us/sm6t 

Regular expression recursion (#40495)

Matthias Urlichs demonstrated a bug in the regular expression engine. It failed with x{2} but went away when replaced by xx. Dave Mitchell explained that sometimes the optimiser fails to recognise such transforms. In any event, with the latest work to the engine in blead, the offending pattern works just fine.

  Even if it does consume quite a bit of memory
  http://xrl.us/sm6u 

Can't locate Foo/Bar.pm isn't a helpful message (#40516)

Tina thought that this message exposed too much of how things are implemented, and a beginner might not necessarily realise that there was a problem loading the Foo::Bar module. She thought that adding a "perhaps Foo::Bar is not installed" to the end of the message would help improve matters.

Juerd wondered whether this would encourage newbies to hunt down the Foo/Bar.pm file and dump it in an @INC directory. People already tend to do this, which leads to problems when modules have XS components and the result is traffic on mailing lists and other fora.

  http://xrl.us/sm6v 

Perl5 Bug Summary


  1525, bugometric pressure steady
  http://xrl.us/sm6w 
  http://rt.perl.org/rt3/NoAuth/perl5/Overview.html 


New Core Modules


In Brief

Tels made a few suggestions about the code in makepatch that received a work-over last week.

  http://xrl.us/sm63 

Dave Nicol suggested that the source code be formatted according to a fixed set of rules.


  http://xrl.us/sm64 

Johan Vromans looked at Carp::Clan and thought that its functionality was so useful that it should be subsumed into the core Carp module, arguing that it is better to extend an existing module than to introduce a new one that duplicates a lot of the functionality of the previous one.

  http://xrl.us/sm65 

Gabor Szabo asked about running Devel::Cover on core modules in blead and Paul Johnson set him straight.

  http://xrl.us/sm66 

Andreas König found that change #28976 breaks YAML, but Rafaël was unable to reproduce the problem. Andreas later discovered that #28997 resolved the problem in any event.

  http://xrl.us/sm67 

Similarly, David Landgren compiled Template Toolkit on a recent build of blead (change #28998) and watched in blow up. Rafaël noted that the code in Template.xs relied on a macro producing a parenthesised result, which was used directly in a C switch statement, and at some point the blead version of the macro lost the parenthesis wrapper. So Template.xs needs to add in the parentheses itself.

  relying on the core to do it for you
  http://xrl.us/sm68 

Jerry D. Hedden raised threads to version 1.44, and threads::shared to version 1.04.

  http://xrl.us/sm69
  http://xrl.us/sm7a 

Juerd Waalboer wondered if he had hit a B::Deparse bug, since a nested hashref/listref structure deparses into a different structure that in turn deparses to something else...

  ... and so on to viscosity
  http://xrl.us/sm7b 

Yves shed some light on why he chose \k<...> over \k{...} for named captures in regular expressions.

  Because
  http://xrl.us/sm7c 

Jos I. Boumans proposed adding Log::Message and Log::Message::Simple to the core.

  http://xrl.us/sm7d 

Mark J. Reed sought more information on the way File::Find orders directories and files, since he could not see a way to process directories first and then files.

  lots of "HACK:" comments
  http://xrl.us/sm7e 

Jim Cromie wanted to know how does Configure -O work.

  delete config.sh and Policy.sh beforehand
  http://xrl.us/sm7f 

Leopold Tötsch was encountering some ambiguous behaviour with sprintf and wondered who was right. Sadahiro Tomoyuki straightened it out with a sharp patch to the right.

  http://xrl.us/sm7g 

About this summary

This summary was written by David Landgren.

Weekly summaries are published on http://use.perl.org/ and posted on a mailing list, (subscription: perl5-summary-subscribe@perl.org ). The archive is at http://dev.perl.org/perl5/list-summaries/ . Corrections and comments are welcome.

If you found this summary useful, please consider contributing to the Perl Foundation to help support the development of Perl.


About this musery

kwilliams on 2006-10-19T16:09:22

Great summary again as usual, David. Thanks for doing this.

  -Ken

Curses, whitespace!

DAxelrod on 2006-10-19T22:38:53

The quote at the top of this excellent summary inspired me to write this response.