I took a clutch of permutation routines and used them to print out all 10! permutations of the numbers (1..10). I ran them on my iBook with time perl permute-xx.pl > /dev/null. The results are stunning.
Both of the permutation routines here run faster than any of the alternatives.
The older pure-Perl algorithms are the slowest. The fastest of them is List::Permutor, at user 7m35.270s, sys 0m1.670s. The rest are a lot slower than that. The cookbook code seems to be even slower than the routine from perlfaq4, if the list is long (> 8 elements).
Algorithm::Permute, with its C implementation, fares rather better. It clocks in at user 2m59.160s sys 0m0.660s.
However, it's pipped by the loopless algorithm immediately below, at user 2m42.170s sys 0m0.430s. And the winning place is awarded to the code I posted here last month, which cruises home at user 2m18.920s sys 0m0.760s.
It's surprising that a pure Perl routine can be significantly faster than handcrafted C - a convincing testament to the power of in-place updates. Just wait until I get the same algorithm implemented in C ;-)