Return

mir on 2008-05-07T16:15:34

An other reason why subroutine should always end with an explicit return.

XML::Twig has this XML::Parser handler for characters that ends (or rather used to end!) with $elt->{pcdata} .= $string;. This is the common case, but it's hidden in the middle of a few if/else's, and it's actually an inlined method call. So it's not obvious what's going on there. But this was what happened when the XML to be parsed included a 4Mb, 60K line, base-64 encoded element: the handler was called 120K times (once for each line, once for each line return). For each of those calls the current content of the element was returned by the handler, and promptly discarded in the bowels of XML::Parser. Except that if you count 120 000 * 4MB/2, that makes nearly 500 GB of memory that needed to be allocated, copied and discarded, for absolutley no good reason at all.

In the end, adding a return at the end of the handler took processing time from 581s to... 2s. It probably improves speed in less specific cases.

And yes, it is one of Perl Best Practices recommendations (although not for performance reasons). So were was the PBP in 1997 when I wrote the first version of XML::Twig?

A failure of the Perl core

autarch on 2008-05-07T17:29:57

This might be a failure of the Perl core, though. Maybe it should detect a return value in void context and just skip it.

Re:A failure of the Perl core, probably not

mir on 2008-05-07T17:50:26
The return value of the handler is probably handled in C by XML::Parser, so it might be an XML::Parser problem. Or the interaction between the Perl core and XML::Parser... or maybe an expat problem. I don't really know at this point, I am just happy I found a solution.

Re:A failure of the Perl core, probably not

Alias on 2008-05-08T00:30:15
So does this mean that XML::Twig will be getting significantly faster now?

Re:A failure of the Perl core, probably not

mir on 2008-05-08T08:25:40

So does this mean that XML::Twig will be getting significantly faster now?
I hoped so, but the tests I have ran don't show any improvement. It looks like "big element split in many lines" is really a corner case, and that in general not returning anything from the handler doesn't help that much. :--(