Perl 5.8 Performance

grantm on 2006-03-06T01:42:10

We recently discovered that we're seeing degraded performance on a server since we upgraded it from Debian woody to Debian sarge. As well as updating all the system libraries, our upgrade included moving to a 2.6 kernel and upgrading from Perl 5.6.1 to 5.8.4.

One job in particular has gone from taking 4.5 hours to taking 9. The process is mostly CPU bound and uses around 2GB of RAM.

There are obviously many things that could have caused the change. I'm not leaping to the conclusion that Perl is the cause, I'm merely interested to know if anyone else has had a similar experience.

Update: Forgot to mention ... another possible culprit we're looking at is hyperthreading which we didn't have with the 2.4 kernel. Ideally we'd just turn it off and see if the job runs faster, in practice it's proving to be more difficult.


check regexen

TeeJay on 2006-03-06T11:48:13

We had a similar problem at foresite with some very complex regular expressions - some operations in the template parsing system (built out of 1 hugely complex regex and hundreds of smaller regexes and functions) became about 10 to 20 times slower.

It was fixed by changing how the regexes worked to reduce the impact of some changes to regexes that made some elements slower.

Hope that helps

Re:check regexen

grantm on 2006-03-06T22:37:32

Hmm, that is interesting. We did have another problem where some of our regexes failed under 5.8. Apparently it was due to a bug that also existed in 5.6 but seldom reared it's head due to differences in memory allocation or something.

Thanks.

An update

grantm on 2006-03-20T08:31:01

Just to put this one to sleep. We didn't find a general problem of Perl 5.8 being slower. The one case where we had objective timings to compare was the one in my original post, where the job runtime jumped from 4.5 hours to 9 hours.

When I profiled this job, I found it was spending the vast bulk of its time in a general purpose data formatting routine. I recoded it to use a more specific formatting routine and the runtime dropped to just over an hour - with the bulk of the remaining time being spent waiting for the database query to return 3 million rows.