I like optimizing code. Yesterday, Patrick asked me if I could improve the build time for Rakudo. I pulled out my trusty Callgrind and profiled the single largest component in the build process. 25 minutes later (Callgrind is accurate, but not speedy -- parallelization please!) I had my data.
This particular hippo is a build stage that turns a file of NQP parse actions into PIR code. The input file is almost 2000 lines long, and the output file is just over 10,000 lines. It takes a while to process.
40% of the runtime is garbage collection. This is one reason we need a better GC, and I'm hopeful that we'll get one as part of the Google Summer of Code.
Of course, one way to make garbage collection cheaper in the short term is to use fewer GCable entities. One of the most expensive C-level operations during the build (outside of the garbage collector) was the isa
vtable entry for the Class PMC. I've long suspected that this was somewhat inefficient, and I wanted to revise it. I had the opportunity this afternoon.
This entry takes a string and checks that the current class is or inherits from a class of the same name as the string. The previous (slower) incarnation created a new instance of a Class from the string and delegated to another vtable entry which performed the same action on classes.
I suspected that the overhead of creating a new GCable element (the Class PMC) and delegating was solvable. I replaced that delegation with a handful of lines of code which perform string comparisons instead. After a couple of tweaks, the test suite all passed again.
I profiled the build again. 20 minutes later, Callgrind showed me a 20% performance improvement. The isa
call was somewhat faster (it's not easy to compare, with the delegation in place), but the GC pressure was much less.
The Rakudo test suite is also about 25% faster, so many other things in Parrot benefit from this optimization.