Today's Rakudo-building speedup is 27.79%.
(Okay, the other slow part of Rakudo builds 17.59% faster. Still.)
The profile showed that string allocation was a hotspot in the benchmark. In particular, the part of Parrot which allocates memory out of arenas spent a lot of time performing garbage collection. Every time you can avoid either allocating unnecessary memory or running a full garbage collection, you can improve performance.
Parrot r27484 adds one line of code (and one line of comment).
Every time mem_allocate()
successfully allocates a new block, it increments a counter. Whenever the garbage collector performs a full run, it resets that counter to zero.
This patch performs a garbage collection run from mem_allocate()
only if that counter is non-zero. That is, if the garbage collector has already run, it's already found as much free arena memory as possible. (This is not memory for PMCs or STRING headers; this is buffer memory.) Running the GC again won't find any more free buffer memory. In that case, skipping the GC run and allocating more memory from the OS gives the performance improvement.
I should note that my comments about avoiding memory allocation apply in the general case. Parrot's current GC has some limitations. The biggest is that it stops the world to mark and sweep everything. The new GC Andrew Whitworth will implement as part of the Google Summer of Code will fix that. As well, we have some ideas to improve the implementation such that the GC will become even less expensive now. Then we'll see algorithmic improvements that make even this 27.79% optimization seem small.