Algorithm::PermuteInPlace 0.01 is making its way to the CPAN.
As an experiment, I used Inline rather than XS. It's just a proof of concept, but it performs okay - about twice as fast as the Perl version, in my tests.
My iBook can now print all the permutations of ten elements within 90 seconds.