I just caught a link to a programming language benchmark for bioinformatics. Unsurprisingly, the Perl is grotty and the C and C++ and Java implementations beat it handily.
Are there any PDLlers who'd like to bring some sanity to the results? (My eyes are going on strike from the C-style nested loops. Make sure you catch the use of bitwise and as control flow operator. That has to work only by accident.)
Re:substr(A,B,1)
Alias on 2008-02-07T04:56:58
(it's also one of the main reasons that XML::SAX::PurePerl is so slow)Ditto with PPI...
I managed to compensate by applying a regex if the character I see suggests I can read ahead a fair way.
You might be able to abuse the regex engine for this though...
s/./something;$1/e
In the alignment.pl code, nearly all the time is spent in the 'compute f matrix' loop. Pre-splitting the strings to arrays saved a few seconds (and took nearly no time). Using @_ directly instead of assigning to lexicals in the score and max subroutines (and using the ?: operator to write one line functions) saved a few more seconds (the python didn't seem quite fair to compare since it has named parameters, so you save the assignment).
There was also a lot of array indexing, so pre-assigning the first level of the multi-dimensional array to temp variables between the i and j loops saved a bit more. All in all (I didn't keep careful track), it went from about 65 seconds to about 37. C and Java still has it beat, but it's a little better now (though a bit more unreadable).
Re:You can double the speed (more or less)
runrig on 2008-02-07T17:49:19
And I don't know if it saved any time, but replacing the C-style for loops with perly 1..$n style ones made at least that part more readable.