AutoXS.pm: A learning experiment in XS and perlapi

brian_d_foy on 2008-04-03T06:56:00

I'm a relatively long-time reader of the perl5-porters mailing list. Somewhat recently Nicholas Clark posed a few small challenges intended to draw more people into the Perl core development. I thought it was a great idea, but couldn't follow up on it at the time. I said I liked the concept on the #p5p IRC channel and so I thought I should learn a bit more about the Perl core and XS. While not the same, I presume that having strong knowledge about the XS/Perl API would be a jump start to understanding the core.

Skip ahead a few weeks. I have since submitted my thesis, went on vacation, and started a new job. But still no progress on my plan to learn XS. Until yesterday. I was idly playing with the B and B::Utils modules when I had a pretty good idea for an interesting learning and experimentation project: AutoXS.

Essentially, the idea started out with using B to scan a running Perl program for subroutines or methods of a particular type. Typically, the simplest and most recurring methods are accessors for hash-based objects. (Just search CPAN for accessor-generators...) The next step is to replace the identified objects with precompiled XSUBs that accomplish the same task but having been written in C, doing so faster.

For simple accessors, that seems like a simple enough task at first: Write the XS code to access a value in a hash reference which is stored on the stack. Apart from the fact that it took me surprisingly long and a lot of patient help from the friendly people on #p5p to get the XS right (thanks!), this may seem like a simple enough task at first. But where's the hash key coming from? You can't expect the user to pass it in as an argument because that's beside the point. You can't know the key name at XS compile time because that's when the module's built. You currently cannot find the package and name using which the current method/subroutine was called either. So what's the answer? Something like currying. I don't think I need to explain to anyone what that is. But maybe I should mention that it's in C, not Haskell or Perl. C doesn't have currying.

The solution took some time in coming. The XS ALIAS keyword allows for compile time aliases to a single XSUB. The aliases can be distinguished from within the XSUB by means of an int variable whose value can be associated with the aliases. (Bad explanation, I guess, have a look at perlxs for a better one.) This doesn't get us all the way to currying, though. I had a look at the generated C code and realized that I could just write similar code on my own and assign new values of that magical integer to each new alias of the accessor prototype at run time (CHECK time, really, but run time would work, too). Then, all that was left to do was to put the hash key for the new alias into an array indexed with said numbers. Voila - fake currying for that XSUB.

By now, it all actually works. The scanner indentifies quite a few typical read-only accessors. The XSUBs are, according to my crude measurements, between 1.6 and 2.5 times faster than the original accessors. If you're calling those accessor methods in a tight loop, that might actually make a bit of a difference. I wrapped it up in a module, AutoXS, and gave it the best interface ever. That is, none. You just say

use AutoXS::Accessor;

to get the accessor scan for all methods in the current package. More seriously, one could let the user flag eligible methods or even apply the scan globally. But that's not the point. It's just kind of fun that it works at all.

Cheers,
Steffen