This Week on perl5-porters - 27 March-2 April 2006

grinder on 2006-04-07T09:13:00

Be the first kid on the block to have your very own pragma.


Topics of Interest

[Part of last week's summary vanished into a worm-hole of the space-time continuum, and reappeared this week. The missing portion is reproduced below.]

Fixing the Your Makefile has been rebuilt tedium

Dave Mitchell grew tired of the fact that the tiniest change to the perl source causes all the extensions to be rebuilt as well. This is because they all have a dependency on lib/Config.pm, which itself is rebuilt each time miniperl is rebuilt.

So Dave changed things around to so that it (and lib/Config_heavy.pl) are only updated if in fact they actually change during a rebuild.

Nicholas thought that this might break parallel makes, and that the mv-if-diff hack was removed as it operated on a similar principle, causing make to consider that the Unicode tables were perpetually out of date, which caused Encode to be needlessly rebuilt many times over in a single run.

  Works on my box
  http://xrl.us/kqps 

Why sv_mortalcopy(&sv_no)?

Nicholas Clark saw that the source code uses the idiom

  PUSHs(sv = sv_mortalcopy(&PL_sv_no));

which dates back as far as 5.003, and wondered why the more reasonable construct

  PUSHs(sv = sv_newmortal());

wasn't used instead. Dave Mitchell thought of a couple of reasons why, such as the getpw* returning empty scalars for unsupported features, but in the general case it was probably quite unnecessary. Yitzchak Scott-Thoennes realised that there was a subtle difference between the two constructs.

This all came about because of change #27612, in which Nicholas changed when and where sv_mortalcopy was used in pp_sys.c.

  Mortal peril
  http://xrl.us/kqpt 

pthread_attr_setstacksize failure

Jerry D. Hedden mentioned that he had received a couple of bug reports concerning the new API for thread stack sizes, concerning threads allocating a stack size of exactly 2Mb. Jerry wondered whether this was some sort of 32/64-bit conversion failure and was otherwise stuck as to figuring out where to go from here.

  2147483648 bytes and counting
  http://xrl.us/kqpu 

Replacing S_new_HE with Perl_new_body

Jim Cromie delivered an exploratory patch to simplify HE allocations in hv.c, to see what effect it would have, and asked for comments.

  Something for hash-heads
  http://xrl.us/kqpv 

Arena-based allocations for ops

Nicholas Clark delivered a long thoughtful analysis of Jim Cromie's other patch that allocated ops from arenas, saying that the complexity it adds probably outweighs the benefits.

  But there is a way forward
  http://xrl.us/kqpw 

Redundant $Config{d_sitearch} paths

Gisle Aas noted that if you configure perl and set $sitearch to be the same as $archlib then the same directory appears twice in the @INC path, which is silly.

Gisle wanted to patch this, but wasn't sure whether he could dive straight in, or whether it required hacking on metaconfig.

To make matters worse, he found that SITELIB_STEM could add yet a third copy of the same directory to @INC, and came up with one patch to rule them all.

  One is enough
  http://xrl.us/kqpx 

A working CLONE for Tie::RefHash

Yuval Kogman patched Tie::RefHash so that it would work correctly with threads. Rafael tidied it a bit as he put it into blead.

  It works!
  http://xrl.us/kqpy 

Multiple Perl version support quandary

John Peacock admitted to having been exceedingly naughty in releases new versions of version and not testing them on older perls. Some of the code contained 5.6-isms (notably dealing with warnings), and wondered what to do to make it work again on 5.005.

John thought that the easiest way would be to fake up an ersatz warnings module, but then, he wasn't sure how to pull off something like no warnings qw(redefine).

Nick Ing-Simmons provided a elegantly devious lightweight solution that please John no end. Yitzchak thought of a problem that John's final implementation might have, and provided another improvement.

  Revis(?:it)?ing compatibility
  http://xrl.us/kqpz 

Do PERL_FLEXIBLE_EXCEPTIONS work?

Nicholas was working (or not) with PERL_FLEXIBLE_EXCEPTIONS and came to the conclusion that due to a compilation error, it didn't work, couldn't work, and could never have worked in over six years and twelve stable releases of Perl. Given that no bug reports have been received to date, Nicholas concluded that they could be scrapped without causing any harm.

Nick Ing-Simmons cautioned about being too hasty, saying that this mechanism was there to allow Perl to be compiled natively by a C++ compiler, to map C's longjmp to C++'s throw.

So Nicholas went off and tried to compile perl with a C++ compiler (g++) on Linux and FreeBSD. Everything failed, usually due to prototypes not being sufficiently precise (mainly char * versus const char *. From which one may conclude that it may be possible to compile Perl with a C++ compiler out of the box, but certainly not with a common C++ compiler on two of the most common Unix platforms.

  Probably not
  http://xrl.us/kqp2 

Making the IO::Socket tests pass on Win32

Yves Orton sent in a patch to get IO::Socket to run its test suite correctly on a threaded Windows build. He had a look at IO::Pipe, but couldn't think of a sane enough approach to make it work.

Steve Hay couldn't get it to smoke cleanly on a non-threaded build, and Yves asked for advice on a better indicator to decide whether or not to skip some of the tests (that deal with forking).

Steve Hay showed how various configure-time switches can be combined in different ways to make all sorts of threadish behaviour in Windows. Andy Dougherty said that the right way of seeing whether fork was implemented was to look at $Config{d_fork}. And if that gave bogus results, well by golly it ought to be fixed up so that it gave a useful result.

Yves thought that this was a marvellous idea... except that it didn't solve the problems for all the perls out there in the field today.

  http://xrl.us/kqp3 

Yves followed up with a patch that fixed just about everything, which was applied by Steve Hay.

  Socket to me
  http://xrl.us/kqp4 

Perl memory management and documentation

Tom Schindl thought that the documentation was unclear on the concept of explicitly setting lexical variables to undef to release memory inside a scope. Pointers were given to various book references, and Yitzchak Scott-Thoennes summed up Perl's memory strategy nicely: "allocate as early as possible and for as long as possible".

Sadahiro Tomoyuki observed that while there are techniques for pre-extending arrays and hashes, no Perl-level technique is available for scalars (although SvGROW can be used from XS code). This could be useful for strings that are expected to become very large. He suggested that making length($string) lvalue-able, as in

  length($str) = 700000000

to tell perl to allocate that many bytes for a scalar could be very helpful (notwithstanding the usual provisos about characters not equal to bytes in Unicode strings).

  Let my bytes go
  http://xrl.us/kqp5 

As usual, Dave Mitchell explained what was happening behind the scenes in a clear and concise manner.

  The algorithm
  http://xrl.us/kqp6 

How should %^H work with lexical pragmata

One of Robin Houston's many contribution to blead last year was a somewhat arcane improvement that concerned %^H (the hints variable) being made available to eval blocks.

Nicholas Clark used Robin's insight to help finish lexical pragmata, and was running into conceptual difficulties over the state of contents of %^H at compile time and run time, and when to make it readable or writable.

The fundamental problem that needs to be addressed is that the state of the hints get compiled into the op-tree, which means that changing the setting of a hint has no effect (hence the "read-only" nature of the beast) once the code has been compiled.

Rafael pointed out that that was just it: %^H and $^H are for affecting compilation and, as they stand, are just not useful at run time.

Something that does have an effect at run time, warnings.pm uses its own variable, a bitfield, which is stored in the op-tree in its own right, which is how warnings can come and go during run time.

Nicholas had a look at the definition of struct cop, as that seemed to be the logical place from which to hang hints, but then realised that since op-trees are shared between threads there's no sane way to make it read-write as well (since that would mean all threads would inherit the change of hinting an any thread).

Nicholas finished up doing an extreme programming number: writing the test to prove that lexical pragmata work. And then subsequently committed a patch that made the test succeed.

Then Hugo admitted to being slightly confused. That's good. I was confused all along.

  I'll give you a hint
  http://xrl.us/kqp7 
  Continued in the new month
  http://xrl.us/kqp8 

Rafael Garcia-Suarez played around with the pragma stuff and came up with a user-level pragma example. David Nicol and Nicholas played around with that, and the feedback from the exercise resulted in a couple of other code tweaks.

(In case you're wondering, a pragma is a module with a lower case name that can be turned on and off through the code. Two prime examples are use strict/no strict and use warnings/no warnings).

  Your very own pragma
  http://xrl.us/kqp9 

At the same time, Rafael thought that it ought to be possible to make encoding lexical as well, and set about trying to find out what was still needed to get it to work. Nicholas and he thrashed out the details.

  http://xrl.us/kqqa 

Combining UTF-16 output with :crlf is awkward

In a parallel universe (read: another mailing list), Jan Dubois discovered that stacking the :crlf layer on top of a Unicode layer causes "Wide character in print" warnings to be issued. The work-around is the use the (in Jan's words) "non-intuitive" :raw:encoding(UTF-16LE):crlf:utf8 layers together, to turn off the PERLIO_F_UTF8 bit in the :crlf layer.

Jan wondered whether it would be possible for PerlIOCrlf_pushed() to inherit the flag from the previous layer, or whether PerlIO_isutf8() should walk the layer stack in order to determine what it should do.

Nick Ing-Simmons preferred the first approach, going as far as saying that that should actually be the default behaviour for a layer. The second solution has the problem of a layer having to determine whether some other arbitrary layer affects UTF-8 or not.

  Layer upon layer upon layer
  http://xrl.us/kqqb 

Thread non-safety in sv_setsv

Nicholas was rather distressed to discover a problem with sv_setsv_flags may put an end to a workable copy-on-write scheme in threaded builds. This came about from looking at the hints implementation, and the fact that threads share op-trees.

  Nice ASCII art, Nick
  http://xrl.us/kqqc 


Patches of Interest

Devel::DProf consting

Andy Lester had a look at Devel::DProf to see about the bugs Jarkko Hietaniemi raised a while back. He wasn't able to fix anything yet, but did clean the code up somewhat, and added some lovely consts in the process.

  It's better than nothing
  http://xrl.us/kqqd 

Poisoning memory

Following on from John Malmberg's plea to have allocated and deallocated memory filled with garbage values (and thus poisoned, to cause errant dereferences to be noticed earlier, Jarkko added a patch to do just that.

Now allocations can be initialised with 0xAB (also known as strawberry cyanide) and freed memory can be overwritten with 0xEF (or blueberry lithium). Andy wondered how we'd gone for so long without it.

  Two exciting new flavours!
  http://xrl.us/kqqe 

VMS pool corruption fix for _NLA0:

In turn, John delivered a patch to make stat work correctly on NLA0:, which is very important if you're doing VMS work.

  http://xrl.us/kqqf 

Long file path support for VMS

Having finished with the preliminaries, John then got down to business with a patch to long path support to all versions of and platforms of OpeVMS that support them.

  http://xrl.us/kqqg 

And rejigged the stat structure used when largefile support was enabled.

  http://xrl.us/kqqh 

Tidying up regexec.c

Andy Lester set his sight on regexec.c, now that Dave has finished with it, and zapped numerous unused macros, inlined a couple of small static functions and sprinkled the magic wand of constness over the lot.

  http://xrl.us/kqqi 

Random accumulated patches from Andy

Andy then shipped out all his patches that been piling up: consting and NULL tweaks (NULL instead of 0 when dealing with pointers, and removing casts on NULL assignments).

  http://xrl.us/kqqj 

and redid the PERL_UNUSED_DECL macro, eliminating a grumpy comment at the same time. [News flash: this was eventually reverted, as there is code Out There which relied on the previous behaviour].

  http://xrl.us/kqqk 

and removed some unnecessary pointer checks

  http://xrl.us/kqqm 

and found some more appropriate versions of the SvREFCNT_inc macro to use.

  http://xrl.us/kqqn 

Add V.pm to the core

Abe Timmerman posted a patch to add V.pm, which was originally written back in 2002 in answer to a question from Tels. John Peacock thought that Module::Info would be more useful.

  "Why?" said Profane. "Why not?" said Stencil.
  http://xrl.us/kqqo 


New and old bugs from RT

$qr = qr/^a$/m; $x =~ $qr fails (#3038)

Nicholas Clark beat everyone else in closing out this bug from 2004.

  http://xrl.us/kqqp 

He also fixed up the "hash assignment to a tied hash erroneously stores data in the real hash too (#36267)" bug too.

  http://xrl.us/kqqq 

Perl segfaults; test case available (#32332)

  http://xrl.us/kqqr 

no sendmsg/recvmsg support (#38808)

Nicholas noted that neither the core, nor the Socket module provide the sendmsg and recvmsg functions. Gise Aas thought that POSIX would be a suitable place in which to have them.

  http://xrl.us/kqqs 

Bad return value from a block with variable localization (#38809)

Vincent Pit filed a bug that showed some code using if(@_), do and return picking up Cundef> in an unexpected manner.

  http://xrl.us/kqqt 

Encoding error in UTF-8 locales (#38812)

Vincent Lefevre posted an encoding bug. Nicholas stripped down the example code and highlighted the error. He wasn't sure whether it was a problem of the documentation not being sufficiently clear, or the core for not dealing with the issue adequately

  Maybe a bit of both
  http://xrl.us/kqqu 

local $h{$unicode} doesn't work (#38815)

Nicholas Clark noticed that local $a{"\x{100}"} = 1 doesn't behave correctly (the way a non-Unicode key like local $a{"N"} = 1 does), and promised to come up with a way to fix it, and did.

  All part of a day's work
  http://xrl.us/kqqv 

Segment fault when using Sockets (#38817)

  http://xrl.us/kqqw 

"use sort 'stable'" sorts backwards with perl5.9.3 (#38831)

Stefan Lidman discovered that stable sorting in blead sorts descending instead of ascending by default. Rafael and Robin Houston had it sorted out in a jiffy.

  I hope they added a test case
  http://xrl.us/kqqx 

What Steve Peters did this week

After Dave Mitchell landed his impressive iterative pattern match patch, Steve equally impressively trawled RT to resolve, like, a jazillion bugs, each resulting in a new message to the list.

An interesting case of seeing how different people explain in their own words what is in fact the same thing.

  • Regexp causes SIGSEGV (stack overflow?) (#1760)

  • Core dump using a Perl regular expression (#6844)

  • Segmentation fault in regmatch() (#6987)

  • Perl Segmentation Fault using /((\w+ )+)/ on long strings (#8685)

  • Recently-introduced regex segfault (#8870)

  • 5.6.0, 5.6.1, 5.8.0 regexp core on (\@\@|.)* (#17611)

  • perl 5.8.0 segfaults (#18489)

  • perl SIGSEGV when applying regular expression to a long string (#21298)

  • Regexp segfault --> ("X"x3529) =~ /( (?: \\. | [^\$] ){1,4000} )/gx; (#21333)

  • Regexp segfault (#21922)

  • Seg fault on long input to re (#21940)

  • Segfault (deep recursion?) in regex match (#22051)

  • Core dump on big regex (#23666)

  • Segmentation fault caused by capturing regex (#24271)

  • METABUG - regex stack overflow issues (#24274)

  • Segmentation fault at m// regexp (#28999)

  • Regexp /^([^f]|f.)+/ Bus error (#31887)

  • SEGV with complicated regexp and long string (#32041)

  • Long strings causes segmentation fault (#32465)

  • Regular expression segfaults perl (#32803)

  • SIGSEGV in S_regmatch (#34349)

  • Simple regexp causes segfault (#36020)

  • Segfault in simple regular expression (#36999)

  • Segfault when doing this regex (#38031)

  • Segmentation fault for matching too long regexps (#38379)

  • Silent self-termination of script using regex (#38470)

  • Regexp Bus error (#38473)

  • Perl Segfault in Regex Match (#38717)

When Dave said he thought his patch would allow a whole pile of bugs to be closed out, he wasn't joking.

Steve attempted to close out Regular expression causes segfault (#36903) but was having access permission problems in retrieving the test code to be used for the bug. Milo Thurston provided another URL to get at the code.

  403 Forbidden
  http://xrl.us/kqqy 

Perl5 Bug Summary

  1563 bugs (but wait until next time)
  http://xrl.us/kqqz 
  Over here
  http://rt.perl.org/rt3/NoAuth/perl5/Overview.html 


New Core Modules


In Brief

Alan Burlison forwarded a message about a new project that had been formed to deal with programming language vulnerabilities.

  http://xrl.us/kqq5 

Hugo van der Sanden added a brief documentation patch to clarify the fact that you cannot use times() to obtain the elapsed time consumed by running child processes, only for finished processes that have been waited upon.

  http://xrl.us/kqq6 

David Nicol proposed a Tie::MaskedArray technique for avoiding the remotely tied global localized with a sigil exploit. I'm a little hazy on the details of this particular exploit. David's techique proposes to replace local, more slowly, but also more safely.

  http://xrl.us/kqq7 

John L. Allen forwarded the current, best patch for pow() on AIX.

  http://xrl.us/kqq8 

Jim Cromie updated the documentation to make it more clear what happens when one does something like Configure -des -DNoSuchConfigureFlag .

  http://xrl.us/kqq9 

Robin Barker found a bug in Readonly in 5.8.8 and so fixed the bug, and added a test to t/op/tie.t to make sure the problem doesn't return.

  (I think I'm beginning to see a pattern here)
  http://xrl.us/kqra 

Paul Marquess provided a small patch to the zip test harness for IO::Compress::Zip .

  http://xrl.us/kqrb 

Sadahiro Tomoyuki looked at some of the recent changes to the source and found some unmatching of parameters and types. Nicholas updated embed.fnc to take that into account.

  http://xrl.us/kqrc 

Andy had a go at linting the source with Sun Studio's lint, and found lots of things that need to be looked at, and wondered whether there were any other lint-like tools freely available that could be applied. Jarkko mentioned FlexeLint (a.k.a Gimpel Lint), which is very nice, but not free.

  http://xrl.us/kqrd 

Yves Orton found a tainting oddity that should possibly be documented. Yitzchak thought that some patches that remove the surprising behaviour would also be well received.

  http://xrl.us/kqre 

H.Merijn Brand backported all of recent changes made by Nicholas in blead's Configure to that of maint. At the end of the week he was still busy filling in gaps in Porting/Glossary.

  http://xrl.us/kqrf 

Feeback from last week's summary

Craig Berry corrected my misreading of the Module::Build on VMS thread, which is that the VMS port currently doesn't offer the list form of piped open.

  http://xrl.us/kqrg 

About this summary

This summary was written by David Landgren. I will be getting a life^W^W^Wtaking a break next week so the next summary will be for the fortnight 3-16 April.

If you want a bookmarklet approach to viewing bugs and change reports, there are a couple of bookmarklets that you might find useful on my page of Perl stuff:

  http://www.landgren.net/perl/ 

Weekly summaries are published on http://use.perl.org/ and posted on a mailing list, (subscription: perl5-summary-subscribe@perl.org ). The archive is at http://dev.perl.org/perl5/list-summaries/ . Corrections and comments are welcome.

If you found this summary useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl.