Improving parsing speed

Matts on 2002-01-31T11:00:57

I've been hacking yesterday on trying to up the performance of XML::SAX::PurePerl (an XML parser written just in perl). It's a hand-built recursive descent parser, and it's reasonably fast, but orders of magnitude slower than any C parser (and probably will always be that way).

Most of the time is spent in the matcher and backtracking routines (for obvious reasons), which simply say "Does the current token (which is just a character) match this character, or this regexp" I guess most of the time is spent there since it's called a zillion times, so I'm not entirely sure if I can reasonably expect to make it that much faster.

My current problem is that I'm working from home today (err, ok so I'm really just avoiding working) waiting for DynaRod to come and clean my drains. But here I've only got 5.00503, and for some reason Devel::DProf is segfaulting for me when trying to test the parser, so right now I'm installing bleadperl to run it under that instead. I could use one of my laptops, but I want to get bleadperl running here anyway. Once I've got DProf running I can be a bit more categoric about where the slow stuff is.

Oh, also did you know that / [range1] | [range2] | [range3] .../ is much slower than / [range1range2range3...] /. Strange that. The major downside though is that character classes can't have spaces in them even under /x without the space being treated as an actual space. This makes the XML Char regexp awful, because it contains a massive number of different character classes. Ah well.

I also emailed my dad about the electrics, but got no reply so far, so I guess I'm going to have to call an electrician. Damn :-(