Patch fixes buffer overflow in regexp compiler

brian_d_foy on 2007-11-29T14:45:00

nicholas writes "Perl 5 Porters have released a fix to the regexp engine, which Google researchers recently discovered had a buffer overflow when compiling very specific patterns. Note that the pattern is the risk, not the data it matches.



All Perl users should consider updating their Perl installation to address CVE-2007-5116:



Specifically, the bug is that if you have a pattern which in itself uses no Unicode characters, but matches Unicode characters (for example \x{} escapes) then at compile time the regexp engine will allocate memory assuming an 8 bit representation on the first pass. However, if the pattern also has 8 bit characters, then when the Unicode characters are compiled (on the second pass) any existing 8 bit characters will be converted to UTF-8 representation, which is likely to be a buffer overflow. Matches will also fail.

Given that it's been present since 5.8.0 (July 2002) and the bug itself (but not the security implication) wasn't reported until early this year*, it is unlikely to crop up in the wild.

Redhat's announcement is https://rhn.redhat.com/errata/RHSA-2007-0966.html which I find unclear in its wording:

Specially crafted input to a regular expression can cause Perl to improperly allocate memory, possibly resulting in arbitrary code running with the permissions of the user running Perl.


The "input" is the pattern, not the matched string. So it's not going to be an issue at all, unless your programmers are foolish enough to allow untrusted user input to be interpolated into regular expressions. If so, you were already open to denial of service attacks from patterns that bust the C stack (fixed in 5.10) or take until the heat death of the universe to complete (inherently unfixable in a general purpose programming language).

The CVE announcement is http://nvd.nist.gov/nvd.cfm?cvename=CVE-2007-5116 It's terse, and has the same ambiguity:

Buffer overflow in the polymorphic opcode support in the Regular Expression Engine (regcomp.c) in Perl 5.8 allows context-dependent attackers to execute arbitrary code by switching from byte to Unicode (UTF) characters in a regular expression.


Yes, conceivably you can now inject arbitrary code. But if you did, you program (not perl) was badly written, and already had the ability to be crashed or hung.

The timing of this announcement is not ideal. Security researches at Google discovered the buffer overflow in the regexp engine compiler. As best I can tell, they reported it to Linux vendors, who forwarded it to one Perl 5 committer, who forwarded it on to the rest. There was some discussion about the patch that needed backporting from 5.10 (where it was already fixed) to 5.8, and then that was the last we heard.

The next "contact" we had was discovering that the Linux vendors had made public security announcements, without even notifying us, let alone discussing a timescale. I consider this outcome neither professional nor courteous, but hope that it was caused by an unfortunate series of events that won't re-occur.

To help avoid such a repeat, we've set up a contact address, perl5-security-report@perl.org solely for reporting any similar core security bugs.

Currently it points to a unarchived mailing list with the same name (perl5-security-report@perl.org), using the regular perl.org list manager and sign-up address conventions. We'd welcome anyone competent to request to subscribe. It's likely not to be high traffic — right now we seem to average 1 security issue every 2 years, but we'd really like more people on it so that there's a good chance that at least one of the subscribers will have the time to respond to any initial report within 24 hours, at least to say roughly:

"thanks for the report. I can confirm that this is a bug. We're looking into how to resolve it, and we'll get back to you"


* The bug was first found against 5.10-to-be, not 5.8.x, by Jeurd, while at the German Perl Workshop, when trying to write a program to convert BNF grammars to 5.10 regexps. You have to push the engine pretty far to get to it."


Patch applied but still mem errors after reg ex

towersys on 2007-12-03T01:43:04

Hi mate,

thanks for the post, we've been getting memory errors consistantly from our perl based blogging software (MT). Even with ulimit "unlimited" and your patch it still seems to die with ""Out of memory during "large" request for 1052672 bytes, total sbrk() is 41959424 bytes at lib/MT/Builder.pm line 375."" just after performing this segment of code:

(builder.pm) foreach (@args) { $_ = [ $_->[0], $_->[1] ]; my $arg = $_; if (ref $arg->[1] eq 'ARRAY') { $arg->[1] = [ @{$arg->[1]} ]; foreach (@{$arg->[1]}) { if (m/^\$(\w+)$/) { $_ = $ctx->var($1); } } } else { if ($arg->[1] =~ m/^\$(\w+)$/) { $arg->[1] = $ctx->var($1); } } }

my $out = $h->($ctx, \%args, $cond); // Line 375


Any ideas?

requires mod_perl re-compile?

markjugg on 2007-12-20T21:48:09

This may be a naive question, but if this patch is applied and Perl is re-compiled, would a mod_perl installed compiled against this Perl also need to be re-compiled?

My guess is "No".