I've now compared old calling conventions with the new one by checking bytecode size, executed amount of opcodes, and the timings of one of the benchmarks.
I've used the suduku solver and the oo5 benchmark. The latter does 1 million attribute accesses via a method call, comprising a typical example of OO code.
The numbers were achieved by running
$ ./parrot -o sudoku.pbc examples/assembly/sudoku.pir
$ ./pdump sudoku.pbc | grep size
$ ./parrot -p sudoku.pbc
and by
$ ./parrot -p examples/benchmarks/oo5.imc or -C .
Here are the results:
old new
Sudoku codesize 5764 3442
Sudoku ops -p 911973 793967
oo5 ops -p 22501832 10503606
oo5 time -C 7.5s 4.9s
The new calling code reduces code size and executed opocdes significantly, which goes up to a factor of two for the benchmark. There is also a non-trivial speedup albeit no attempts were done yet to optimze the new code for speed.