PerlIO::gzip intermittent failure...

ajt on 2009-04-16T08:02:56

Yesterday I spent a few hours banging my head against a brick wall. I'm using PerlIO::gzip on ActiveState Perl 5.8.8 to decompress a large XML file. The XML file decompresses perfectly with Cygwin's gzip or xmllint, but PerlIO::gzip sometimes mangles the file and it starts as XML then degenerates into soup and sometimes it decompresses the file perfectly.

I wasted quite a few hours thinking it was something else, but it's clearly a random problem with PerlIO::gzip, sometimes it works and sometimes it doesn't. I can't make any sense of it, how can it work some of the time and the rest of the time it generates gibberish?

I'm going to give IO::Uncompress::Gnuzip a go now to see if it's consistent and reliable. It's a shame as the PerlIO::gzip interface was very handy.


Precision

nicholas on 2009-04-16T09:08:07

it's clearly a random problem with PerlIO::gzip, sometimes it works and sometimes it doesn't. I can't make any sense of it, how can it work some of the time and the rest of the time it generates gibberish? yet you say I'm going to give IO::Uncompress::Gnuzip a go now to see if it's consistent and reliable.

So, I infer, the cause of the observed problem is not clearly anything, as you don't yet have data ruling out other possible causes, because you've not yet tried the experiment of keeping everything else the same and only changing the suspected component.

Have you tried the same data file with the same version of ActiveState perl on another OS, with the same versions of all modules? That's probably not that easy to set up, but even the results from same data file, different installation would be informative.

Also, what is the buffer size for buffered IO under ActiveState perl? That might be part of the cause, as I've seen an error with source filters that only shows up on some *BSDs, where the buffer size (IIRC) was 512 bytes, rather than the more common 4096 bytes. I haven't yet had time to stop and work out why that's going wrong, or where the bug is, but that one appears to be somewhere in some level of the code in the perl core. Hence I know I don't quite trust it, so don't want to rule anything out.

Re:Precision

ajt on 2009-04-16T11:46:20

Originally I thought that the same compressed file and the same Perl script could be run time after time and some of the time it would and some of the time it would not work. That really confused me, but it's a Windows box with an old version of ActiveState Perl on it, so I'm not expecting perfection.

I now think that some files can and some can't be processed. Once you have a "bad" compressed file PerlIO::gzip can't deal with it, but other methods can.

IO::Compress::Gunzip does work with the same file that PerlIO::gzip has problems with, so from my perspective that's the solution I'll use.

I'm happy to help if I can, the problem is intermittent though, it's currently okay with the one example file and not happy with another. Cygwin gunzip is happy with both, I'm not sure what error correction gunzip does if any to keep going if there are problems.

c:\perl -v

This is perl, v5.8.8 built for MSWin32-x86-multi-thread
(with 18 registered patches, see perl -V for more detail)

Copyright 1987-2007, Larry Wall

Binary build 822 [280952] provided by ActiveState http://www.activestate.com/
Built Jul 31 2007 19:34:48

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

C:\perl -MPerlIO::gzip -e "print $PerlIO::gzip::VERSION"
0.17

I can email you an example gzip file if that would help. I'll also try the bad gzip file on another platform to see if it's a Windows problem or a more generic problem.

A bug

Ron Savage on 2009-04-18T01:51:34

Hi

This symptom normally means there's a bug in the code, in that some variable is uninitialized, and when used causes various results, depending on the value which happens to be in it.