Regex hacking

demerphq on 2006-09-14T18:06:29

I recently left my job and have been decompressing from a long period of continuous employment by enjoying some time off with some nice weather, bike riding, late mornings, breakfast at the cafe and lots of hacking.

I've managed to close off most of my regex engine todo list: ANYOF and jump tries, aho-corasick startclass matching, postitive-look(ahead|behind) optimisation (theres some rough edges to deal with on this one yet), charname support in the parser, and about 75% of what needs to happen to the debug output to make it "non-regex engine hacker" friendly.

All this combined with my earlier efforts (single-char-ANYOF to EXACT, simple-TRIE's) mean that perl now comfortably outperforms python in the so-called "rebench" tests. (bleadperl average time per test: 28 usecs, python average time per test: 297 usecs) The use of regex preprocessors to do "trie" like optimisation will no longer be necessary, and in fact will often slow things down, as it will result in duplicating things that are already happening internally.

I need to revisit perlreguts and update it with what Ive learned, and in some cases what has changed.

I need to update my journal more often. :-(

Things still to do: Clean up/reorganize re Debug stuff to be easier to use. Look into using reghop4() for things. Look into the MAGIC hack for making things pluggable. Look into migrating code in sv.c re_dup over to regcomp.c Maybe split regcomp into several pieces to make it smaller and easier to manage.


perlguts

speters on 2006-09-15T01:23:59

I've been meaning to add a review of perlguts to perltodo.pod. There have been quite a few refactorings, new bits of API, and new macros meant to make life easier for Perl 5.10. This is something actual that a lot of people need to look into and make suggestions on.

Re:perlguts

demerphq on 2006-09-15T10:32:41

If you do could you mention that perlreguts needs to be reviewed as well please?

There must be some award...

Matts on 2006-09-15T08:01:29

There really must be some award for the incredible work you've done on the regexp engine. For a long time it was feared that the only person left who could hack on it was Ilya Zakharevich (well, that was my fear) and you've really taken up the banner and pushed perl's regexp implementation into fantastic new territories. I can't wait for perl 5.10 now!

Re:There must be some award...

demerphq on 2006-09-18T12:48:37

You know its comments like this one that really get me motivated. Often in the p5p circles its hard to tell how ones work is received. Everybody is so experienced (and in some ways jaded....) that they tend not to give to feedback like this. So hearing you say this really warms my heart. Thank you, and I'm pleased that you like it.

Re:There must be some award...

jmason on 2006-11-14T16:43:52

I'm late to this journal comment -- but I agree with Matt. This is really cool code, and I hope we can use it effectively in Spamassassin; I'm certainly trying ;)

Re:There must be some award...

chromatic on 2006-11-17T00:03:06

I wonder if they're jealous of the time and energy you have to do such work. I certainly am, but I look forward to using it.

PCRE

Tony Finch on 2006-11-17T20:28:47

That sounds pretty awesome. How doe PCRE compare?