Full circle

nicholas on 2009-07-01T09:19:48

A subroutine-threaded core stores each opcode as a separate C-level function. Each op in sequence is called and then the op returns back to the runcore. This is two branch instructions to dispatch each op, compared to only one for a direct-threaded core. However, recent benchmarks I have seen in Parrot show that the subroutine core actually performs faster then the direct-threaded core does. This is because modern microprocessors have lots of hardware dedicated to predicting and optimizing control flow in call/return situations, because that is one of the most common idioms in modern software. This is a nonintuitive situation where more machine code instructions actually execute faster then fewer instructions. Parrot's default "slow" core ("-R slow") and the so-called "fast" core ("-R fast") use this technique (actually, these cores aren't exactly "subroutine-threaded", but it's close). From the numbers I have seen, the fast core is the fastest in Parrot. Here's how it works, basically:


for (pc = program_start; pc < program_end; pc++) {
    functable[*pc](interp, args);
}

Reminds me a lot of

while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
    PERL_ASYNC_CHECK();
}

What goes around, comes around. Although Whiteknight's blog gives a lot of useful detail on why it's come around again, and why it was different in the middle.


Function-based Dispatch

chromatic on 2009-07-01T17:36:25

It's also the easiest to write, especially if you want to perform control flow trivially. Of course, it can make JITting more difficult.