It's very interesting to see Tim Bray's Wide Finder Project results. Essentially, it's a log parsing benchmark. The twist is that it's running on a Sun SPARC Enterprise T5120 Server. This is a Niagara II-based server which has eight CPU cores, each of which is able to run eight threads - effectively 64 concurrent threads. Why is this important? Because it's getting harder to make chips go faster, so instead chips are going multicore.
This really started because Tim wanted to play with Erlang, but the initial performance was terrible. Currently top of the list is Perl, with a program by Sean O'Rouke. It's compact but beautiful code.
Note that this unfortunately breaks in blead, where the mmap()ed string is copied to preserve $1 and friends, but it should be easily fixable after the release.