how NOT to benchmark

perrin on 2004-09-09T18:38:12

So, Tim Bray wrote this hairy regex which he ran in Perl and in Java, and it ran faster in Java. He then posted this result to his blog. There is no sample data to try it yourself, it's a single test involving unusual unicode stuff, and he says that "they don’t produce quite the same results, with occasional variation around international characters", but this has not stopped Java fans from declaring that Java's regex engine is faster than Perl's.

Then he goes on to say that perl 5.8.3 gives a different result from perl 5.6.1 and that this is somehow a strike against Perl's suitability for "enterprise" work. This sounds like a result of the many bugfixes in unicode and regex stuff that happened between those versions to me, but he appears to be saying that the inability to fix bugs in Java is a positive thing, since it means you can get the wrong answer consistently. He also doesn't say whether or not he was using the same versions of Java in each test.

The problem with blog postings like this isn't so much that they are exist as that many people who read them will not have any context for evaluating the validity of the benchmark, and will just start spouting nonsense about Java's regex speed. Very frustrating.


Also...

Dom2 on 2004-09-09T20:32:01

It doesn't help when you have people like Simon Cozens agreeing with him. :)

Actually, I reckon he would have been better off using common lisp and cl-ppcre

-Dom