putting perl on diet

ethan on 2003-02-19T11:10:51

I got this idea from a thread on comp.lang.perl.misc (and notably some encouraging from Janek Schleicher who considered the idea very cool): Creating a module that allows storing of data zlib-compressed.

I initially thought this could be done via tie() but Perl's tying interface is too limited to do that effectively. For scalars you only have FETCH() and STORE(). This defeats the purpose of compression for example in the following code:

$string = "string" x 1_000_000; print substr $string, 1, 1;

Obviously, via tie() this would result in uncompressing the whole data in memory. It would also be very slow.

The obvious solution therefore is (apart from adding SUBSTR and all the other string-operators to the tie-interface) a class of its own with a little bit of overloading of "", .= etc.

It sounds much more trivial than it is as I had to realize. I started hacking away the XS part till I could at least store and get the data. The string becomes a linked list of buffers with the original large string divided into CHUNK_SIZE-large pieces which are then compressed into the aforementioned buffers. After that I was eager to do a little benchmark:

my $uncompressed; my $compressed = String::Compress->new;

cmpthese (-2, { compressed => sub { $compressed->store("hallo" x 1023); my $d = $compressed->get; }, uncompressed => sub { $uncompressed = "hallo" x 1023; my $d = $uncompressed; }, });


Urmmh, here's the embarrassing part now:

compressed: 5 wallclock secs ( 1.02 usr + 1.18 sys = 2.20 CPU) @ 509.09/s (n=1120) uncompressed: 4 wallclock secs ( 2.05 usr + 0.00 sys = 2.05 CPU) @ 40707.32/s (n=83450) Rate compressed uncompressed compressed 509/s -- -99% uncompressed 40707/s 7896% --

So it's slightly slower. On the other hand, "hallo" x 1_000_000 eats about half the memory an ordinary Perl scalar would need. When increasing CHUNK_SIZE to a real large value such as 500_000 (it's just 4096 right now) it could probably be further dropped to less than 10kb (for a repetitive string like the above only, of course).

But my actual concern is something else: I reimplement the string operators as methods which is at least feasible for thinks like chomp, substr etc. But what about regular expressions? I'd need to reimplement Perl's RE-engine (working on segmented compressed little strings which form one large string!). I think I'll leave that to someone else (Janek perhaps:-).


Magic

rafael on 2003-02-19T12:26:24

I wonder whether it's possible to implement this using magic. Look up PERL_MAGIC_uvar in the perlguts manpage to see what I mean.

Re:Magic

ethan on 2003-02-19T14:49:35

I wonder whether it's possible to implement this using magic. Look up PERL_MAGIC_uvar in the perlguts manpage to see what I mean.

I am not sure whether the U magic is powerful enough. The ufuncs struct simply contains a pointer to a get and set function. The third member, uf_index, is just an IV that doesn't seem to be used for anything else other than as an identifier (that is what grepping through the 5.8.0 sources suggests).

There is a whole mot more of magic available, but I am not sure whether I am supposed to diddle with it. For instance, there is vtbl_susbtr. So am I allowed to to take an SV and decorate it with my own custom MAGIC structures? And then adding MAGIC->mg_moremagic if I so wish?

I am rather reluctant as to that because Perl's magic is so thinnly documented. perlguts seems to imply that only 'U' and '~' is available for extensions.

But if I am free to roll my own set of magics and attach it to an SV it would be much cooler since then a scalar could really be used like an ordinary variable without the limitations of tie().