This Week on perl5-porters - 18-24 May 2008

grinder on 2008-05-30T21:18:00

This Week on perl5-porters - 18-24 May 2008

"Ah, more details about filenames. Well, this sounds positively weird. Octet strings are not particularly user-friendly if you can't interpret them as characters reliably.

From what you say, and what I think I've heard elsewhere, Unix filename interpretation is a mess. Seems like the only bigger mess I've heard about is VMS file handling, where they seem to have a choice of several messes." -- Glenn Linderman, deep in the heart of Unicode, case conversion, filenames, encodings, character sets, ß and other exciting issues.

Topics of Interest

Another perldoc shortcut

Tom Christiansen commented on Gisle Aas's perldoc shortcut (that perldoc ipc would redirect to perlipc, assuming no ipc.{pod,pm} existed), saying that in pre-5.8 times he had been working on a technique to make perlipc itself, run from the command-line, do the same thing. Somewhere along the line, things went astray and the work never made it the core.

  not bitter, not really
  http://xrl.us/bk9mc 

File::Path::mkpath() incompatibility in perl-5.10

I had expected to make some progress on this issue, this week, but Real Life is eating my tuits like popcorn at the moment.

  next week, cross my heart
  http://xrl.us/bk9me 

On the almost impossibility to write correct XS modules

I might preface this thread "on the almost impossibility to write a correct summary of a complex subject". Marc Lehmann had written a few weeks ago that a bare char * through an XS API is fraught with peril, because there is no metadata available to tell you if it's Latin-1, KOI8-R, UTF-8 or something else.

The thread blossomed this week, with a long-running debate about what is broken (and when, and how). One point that was made is that Win32 encodes filenames in a particular way that doesn't really jibe with the rest of the internals. Unfortunately, it is only with hindsight that the problem really became apparent, hence the dilemma is that fixing it would break everything that has tried, with various degrees of success, to work around it.

The utf8 flag on SVs was again singled out as being responsible for world hunger and other assorted ills, with a number of examples demonstrating the problems.

Rafael Garcia-Suarez outlined an approach that just may be a way forward out of the mess. After listening to Juerd Waalboer, he thought that marking an SV as "binary" and thereby disqualified from being upgraded to Unicode would be quite useful.

Glenn Lindemann invented "blorf" as an opaque token for discussing the issues without people getting sidetracked over definitions of bytes, strings, characters, numbers and codepoints.

  hard core
  http://xrl.us/bk9mg 

It's wafer thin!

David Nicol's tiny patch to document the empty pattern (m//) more clearly sparked a fairly intense technical debate over how to get rid of the latter.

One point of particular interest was when Aristotle Pagaltzis suggested a s///R modifier which would return a modified copy of the original string, instead of modifying the contents and returning the number of matches made.

As it turns out, this would solve a number of problems very nicely, not the least being the elegantly succinct

  my @changed = map { s/$this/$that/R } @list;
  so let's have it already
  http://xrl.us/bk9mi 

Compiling 5.10 with g++ 4.3.0

Not content with compiling perl with old gcc compilers, Bram took a very new one for a spin to see how things worked out.

It did of course go boom (otherwise you probably wouldn't be reading about it). Bram traced the problems down to typedefs and enums in system headers, and wondered how in Configure this could be sorted out.


  duty now for the future
  http://xrl.us/bk9mk 

Getopt::Long, + options, installperl and +v

Nicholas Clark was looking how to factor out the common code in installman and installperl and noticed that the main sticking point regarding installperl was that it admitted a +v switch (and it does something else than -v), using hand-rolled @ARGV processing.

This precludes it from using Getopt::Long because, while Getopt::Long can be taught to accept -x and +x, it offers no way of discriminating between the two.

Johan Vromans said that as it turns out, with a bit of hand-holding, it is possible to coax the information out as things stand, and he plans to improve support for - and + switches in a future release.

Nicholas thought that a middle path might be to keep the hand-rolled code, but adjust it to dump its results into an %opts hash, which would allow a drop-in replacement when Getopt::Long gets updated with the needed functionality.

This brought forth a long discourse from Tom Christiansen, who admitted to the wrong kind of laziness regarding command-line switches by resorting to hand-rolling code to deal with a solitary switch when in fact it would have been better to rely on a module. When he quizzed Larry Wall about it during the first decade of Perl's development, Larry admitted to rolling his own frequently, since it seemed a bit of a waste in his eyes to pull in a module for just one or two or switches for a program little more than a one-liner. As a peace offering for his own hand-rolling sins, Tom offered the list the ultimate file renaming Perl program.

  bespoke options
  http://xrl.us/bk9mn 

On broken manpages, trolling, inconsistent implementation and the difficulty to fix bugs

Marc Lehmann wrote a long response to Jan Dubois as a spin-off from the "On the impossibility of writing XS correctly", stating that Perl's Unicode handling because some parts of the core deal with Unicode one way, and other parts another way. This leads to annoying bugs, in that they are hard to identify, and hard to fix.

Tom Christiansen called him out for excessive use of rhetoric and asked him to clarify a couple of points. Several messages later Yves Orton offered a nice summary of the situation that showed where things break down. Then people started to speak about encodings, bytes, characters and character sets and as usual my eyes began to acquire that dead fish look.

  see also
  http://xrl.us/bk9mp 

On the problem of strings and binary data in Perl

On the subject of subjects on the problem of things, Yves Orton broke out into a new thread to discuss the schizophrenic attitude that Perl has when dealing with strings. He put forward a proposal for identifying and processing Unicode strings asked people to point out where he was wrong. Rafael Garcia-Suarez made a decent effort at doing just that.

Juerd Waalboer provided a contrarian argument, suggesting that Unicode works pretty well in Perl, insofar as one can have strings containing Unicode, and other strings containing binary data, because in a correct program, one usually doesn't have the two appearing in the same string. (such as having the Thai-encoded name of a Thai person concatenated with the slurped contents of a PNG file representing his signature in the same Perl scalar). In Juerd's eyes, the main problems come about when dealing with pure binary data and hoping that it doesn't wind of being treated as Unicode when it shouldn't.

  more recommended reading
  http://xrl.us/bk9mr 

As a followup to the above discussion, Juerd announced that he had released BLOB to CPAN.

  http://xrl.us/bk9mt 

English.pm alias for %+

Amir Elisha Aharoni ventured for the first time into the waters of p5p, suggesting that %NAMED_CAPTURE would be a nice English name for the new 5.10 %+ variable. Yves Orton thought the idea was worthy of consideration, but one also needed to deal with %- at the same time, which could be named %MAMED_CAPTURE_LIST.

  updating the babelfish
  http://xrl.us/bk9mv 

07arith.t failing on _strptime('2001-2-29 12:34:56','%Y-%m-%d %H:%M:%S')

February 29, 2001 was not a leap year, so trying to format it is an error. Apparently there is a test in Time::Piece to ensure it fails in the correct manner. Unfortunately, on some of the more exotic platforms like VMS and OS/X, the call also correctly fails, but does so in a way that fools the test suite.

  at the third stroke it will be the 32nd of february
  http://xrl.us/bk9mx 

Gisle Aas gave some additional background regarding Time-Piece-1.13 test failures on HP-UX, by forwarding a message he sent to Matt Sergeant, the author of Time::Piece.

  http://xrl.us/bk9mz 

Some smoke digging (HP-UX failures)

H.Merijn Brand delved into HP-UX smoke reports to figure out what was going wrong. Time::Piece was already under control (see above), but Math::Trig was failing (and the only recent change has been an upgrade to Math::Complex). Tests for readdir were also turning black, which suggested subtler problems.

Half way through the conversation, Craig Berry announced the integration of Gisle Aas's fix for Time::Piece which addressed the VMS problems, and H.Merijn reported that it did the trick for HP-UX as well. Using the power of CPAN, H.Merijn was able to go through previous Math::Complex versions, and this allowed him to resolve that problem.

I think the readdir problem was solved by upgrading smoke harness.

The remaining failure appeared to be caused by use blib hoisting in an errant directory into @INC. Bram showed him how to fix that, which should nail down the last error.

  going for O O O O
  http://xrl.us/bk9m3 

But then H.Merijn reported a problem with a failing blib test, and everyone pretended to pay attention to the character encoding debates.

  war knocked
  http://xrl.us/bk9m5 


TODO of the week

Improve the coverage of the core tests

Use Devel::Cover to ascertain the core modules's test coverage, then add tests that are currently missing.

Just to help budding testers along, here is a non-exhaustive list of suggestions to get you going (suggested by sorting out the biggest .pm files is lib/):

AutoLoader
AutoSplit
Benchmark
Cwd
DB
Dumpvalue
Exporter
Memoize
NEXT
SelfLoader
charnames
diagnostics
overload
warnings

Even concentrating on a single module would be helpful.


Patches of Interest

ExtUtils::ParseXS - Error reporting problem with INTERFACE and ALIAS keywords

About a year ago, Ken Williams explained that, while he was the maintainer of this module, he didn't know what was the best way to address the problem that Robert May had brought up regarding error reporting.

  then
  http://xrl.us/bk9m7 

Of the two approached supplied by Robert as a solution, Ken liked the second one back then, and Nicholas Clark, reviving the conversation agreed that it seemed to make more sense.

He had a look at how things work currently, and realised that with a new function, he could effect a small saving of space. As a result, both the core and EU::PXS could rely on the function.

Nicholas wrote the function, and felt that it would make it into 5.8.9 and 5.10.1. or older releases, ExtUtils::ParseXS would need to bundle the function, and emit it as required if the core didn't supply it.

Rob thought that this sounded reasonable, except that if ever a bug is found in the function that Nicholas just wrote, it would need to be fixed both in the core and EU::PXS. Since this would be less that desirable, Robert said that he would try to come up with an alternate patch at some point.

  now
  http://xrl.us/bk9m9 

lib.pm should not warn about loading .par files

Paul Fenwick noted that a use lib 'Foo.par' will issue a warning, but load the damned thing anyway. Since someone pulling in a library in this way probably has a pretty good idea what they're doing anyway, Paul thought it would be a good idea to suppress the warning, just for .par files.

Rafael Garcia-Suarez felt that this made sense, so he applied the patch. Steffen Müller wanted to know if this meant that lib.pm would be dual-lifed, so that 5.8.8 could benefit from the improvements.

  dual-life pragma on par
  http://xrl.us/bk9nb 

Indented preprocessor directives in sv.c

Jerry D. Hedden noticed that some preprocessor defined in sv.c were not flush left, and thought that some compilers would choke on it. H.Merijn Brand explained that it was perfectly legal according to ANSI, although he admitted that some older compilers, such as on AIX, would likely get into trouble over this.

Both Robin May and Andy Dougherty explained that something that does work is to leave the # in the first column, and then indent the macro preprocessor directive as appropriate.

  hash hard left
  http://xrl.us/bk9nd 


New and old bugs from RT

*x{IO} bizarre copying (#3314)

Steve Peters discovered that some bizarre code that used to emit a bizarre error message now emits a more prosaic error message. He noticed that the change occurred way back in change #27179 and asked if anyone had objections to backporting it to 5.8.

  a leap into the unknown
  http://xrl.us/bk9nf 

exists(): error message on wrong argument type is incorrect (#38955)

A couple of years ago, Jeremy Hetzler noted that exists may be applied to a HASH, an ARRAY and also a subroutine name. The documentation even admits as much.

On the other hand, for incorrect use, such as applying it to a scalar, the error message makes mention of only HASH and ARRAY, not of subroutines.

Bram patched the source to bring the error message into line with the documentation and implementation, and Rafael Garcia-Suarez applied it.

  language lawyers rejoice
  http://xrl.us/bk9nh 

No complaint about bareword (#53806)

Rafael Garcia-Suarez supplied a fix for the print Does::Not::Exist, '' problem, so that the bareword is correctly identified as such, and not stringified. Despite all the magic surrounding print's first argument, all that Rafael needed to do was to hoist a goto label four lines higher in the source.

H.Merijn Brand applied the correction, along with Bram's tests.

  http://xrl.us/bk9nj 

pod2man loses =head2 starting ' or . (#53910)

Bram correctly identified Pod::Man as a dual-life module. This means that the best place to fix this particular problem is in the CPAN distribution, which can then be synched with blead when the problem is fixed.

  SEP
  http://xrl.us/bk9nm 

IO::Seekable + POSIX = constant subroutines redefined (#54186)

Part of the fallout from Nicholas Clark's corrections for this bug is that calls with the wrong numbers of arguments causes the program to croak. Rafael Garcia-Suarez felt it was safe enough to inflict on the world. As a point of confirmation, Sébastien Aperghis-Tramoni ran a code search and didn't find any examples of such usage.


  safe to break
  http://xrl.us/bk9no 

perlipc problems

Andrew at Sundale noted a problem in the documentation in perlipc concerning the signalling of negative process IDs. Steve Peters tweaked the example to show more clearly what was happening.

  perlipc and negative pids (#54412)
  http://xrl.us/bk9nq 

Andrew found another problem with setsid, in that that the documentation suggests a setsid or die idiom, except that, if one reads the manpage for setsid, one learns that it returns -1 on error (as do many other system calls). As such, if the setsid call fails, the die won't be triggered.

  perlipc and negative truth (#54422)
  http://xrl.us/bk9ns 

While we're on the subject, Andrew found one final problem concerning the documentation for safe pipe opens.

  perlipc unclear on the concept (#54424)
  http://xrl.us/bk9nu 

Faulty select() in Activestate perl (#54544)

Marc Lehmann noted that select returns "Unknown Error (10022)" instead of simply timing out.

  just no it
  http://xrl.us/bk9nw 

Assertion failure fiddling with @ISA (#54566)

Niko Tyni discovered a way of abusing @ISA that would result in an assertion failure. Rafael Garcia-Suarez figured out what was going wrong in mg.c and provided a patch, that H.Merijn Brand applied.

  out through the smtp tunnel
  http://xrl.us/bk9ny 

"Can't take log of 0" error in perl 5.8.8. 64 bit (#54590)

Lourdes Peña Castillo reported that on some versions of perl, but not others, the number 2.5e-310 gets rounded down to 0, and the log of 0 is negative infinity.

Various porters reported similar behaviour on a variety of perls, platforms and Configure options, but no clear reasons why.

  now you see it, now you don't
  http://xrl.us/bk9n2 

PerlIO::via free unrefed scalar on certain dodgy code (#54686)

Kevin Ryde wrote some slightly broken code that managed to make the perl interpreter complain about memory problems. He wasn't especially worried about a fix any time soon, but wondered if it was a symptom of an underlying problem that needed to be addressed.

  need to know
  http://xrl.us/bk9n4 

Regexp modifier to disable interpolation like m'' (#54702)

Ed Avis filed a feature enhancement request, to allow the /n flag on a regular expression to indicate that no interpolation should be performed.

Currently, only m'300 $US' (with single quotes as a pattern delimiter) does no interpolation. Ed thought that /300 $US/n might be clearer.

  we'll get the whole alphabet in some day
  http://xrl.us/bk9n6 

PathTools-3.27 triggers a bug in Perl (#54728)

Jan Dubois isolated a problem in File::Spec::Win32's catfile function. The fix from the client side is to stringify a $1 passed as a parameter (a variation on the "better to be paranoid than sorry" theme), since catfile appears to clobber it with some other action before getting around to using it. Ideally, catfile should stringify its arguments itself, although Jan wondered if there was a more general way of solving the problem.

  match point
  http://xrl.us/bk9n8 

Perl5 Bug Summary

  278 new + 1345 open = 1623 (+13 -43)
  http://xrl.us/bk9oa

  http://rt.perl.org/rt3/NoAuth/perl5/Overview.html 


New Core Modules

Thread::Semaphore

Jerry D. Hedden released 2.08, which adds a few checks for undefined parameters.

  http://xrl.us/bk9oc 


In Brief

Ricardo Signes wondered why delete local $hash{elem} didn't work when local $hash{elem}; delete $hash{elem} did. After boggling briefly over the syntax, Rafael Garcia-Suarez thought it wouldn't be too hard to make it work.

  http://xrl.us/bk9oe 

Ricardo Signes looked at the documentation in perlobj and corrected errors and omissions in DOES . He hinted that he would take the axe to the documentation for UNIVERSAL.

  less is more
  http://xrl.us/bk9og 

Jerry D. Hedden corrected a typo in perlop.pod that H.Merijn Brand estimated as being a difference of about 3 pixels, thus possibly qualifying for the smallest patch ever.

  http://xrl.us/bk9oi 

He also silenced build warnings in universal.c .

  http://xrl.us/bk9ok 

Nicholas Clark discovered what he thought was a usage error in XS subs with the ALIAS keyword. This reminded Robert May that he had written about a similar problem with INTERFACE last year, and that the message had gone nowhere.

  http://xrl.us/bk9on 

Florian Ragwitz also managed what was roughly a seven pixel change to fix a documentation typo in Attribute::Handlers .

  http://xrl.us/bk9op 

Artur Bergman handed over maintenance of Attribute::Handlers to Rafael Garcia-Suarez.

  http://xrl.us/bk9or 

Moritz Lenz saw that Memoize.pm refers to old title of "Higher Order Perl" and changed the wording. There was some discussion as to whether the full text of HOP was available on the web, and if so, where?

  http://xrl.us/bk9ot 

After Steve Peters performed an upgrade to AutoLoader to bring it to 5.66, Nicholas Clark bumped it up to 5.66_01 to be on the safe side.

  for the record
  http://xrl.us/bk9ov 

Craig Berry returned to the File::Copy & permission bits issue, saying that changes were unlikely to fly on VMS. Aristotle Pagaltzis pointed out that on Windows, files tend to inherit their permission bits from the directory in which they reside, and that the only important bit to honour on Unix systems is the execute bit.

  http://xrl.us/bk9ox 

Renée Bäcker was Warnocked over a patch to add more documentation to attributes.pm .

  http://xrl.us/bk9oz 

About this summary

This summary was written by David Landgren.

Weekly summaries are published on http://use.perl.org/ and posted on a mailing list, (subscription: perl5-summary-subscribe@perl.org ). The archive is at http://dev.perl.org/perl5/list-summaries/ . Corrections and comments are welcome.

If you found this summary useful, please consider contributing to the Perl Foundation or attending a YAPC to help support the development of Perl.