I've rewritten my recursive regex implementation, and I think it actually works properly at last.
I have started wondering about the feasibility of replacing perl's regex engine with PCRE. The regex engine is supposedly pluggable already, but it looks as though plugging in a completely different regex engine would still be non-trivial. Any thoughts?
Re:Non default replacement
robin on 2002-04-21T14:28:40
Doing it lexically would be trickier, because you'd have to either add a new set of OPs for the PCRE versions, or at least a flag for the existing ones. I was thinking it would be more along the lines ofHow hard would it be to make a module that caused all regular expressions lexically to use PCRE rather than the default one?use re 'debug';
(which is to say, not lexically scoped).The first stage is to write a simple XS interface to PCRE, and then we can do some sensible benchmarks as well.
.. while talking about compositionality. Has anyone looked at implementing this in either the Perl5 engine or PCRE? Is anyone planning to? I'd like to have a crack at it, but it'll probably take a long time and be horrifically messy; though I've studied Kleene Algebra/FSAs etc, I haven't hacked that deeply into the Perl source or regex engine before.The cure for this ill is to allow capturing groups to be named rather than numbered, and to decree that such a name remains in scope until another group is encountered using the same name.
REGEXP*
. For instance, many of the regexp magic variables ($+
and the like) dive into the REGEXP*
structure directly. That may not be too much of a problem, since you can use optimize.pm
to dike out those GVSV
s and replace them with calls to a function in the re::pcre
library.
So other than that, it ought to be pretty plain sailing. If you're going to do it, I'd suggest using the POSIX interface to pcre
. Oh, the other thing to think about is what happens to regexp modifiers in toke.c
. Your wrapper functions will have to take a Perl REGEXP
structure, decode it, feed the arguments into pcre
and then decant the results back.