Late last night, PerlJam posted in #parrot a small Perl 6 program which gave the wrong answer in Rakudo:
my $foo = 'fred'; say $foo; $foo--; my $bar = 'fred'; say $bar;
The correct output is obviously:
fred fred
Rakudo gave:
fred frec
PerlJam and Infinoid both correctly diagnosed the problem as a COW problem. What's that, and why does it matter?
The Rakudo compiler turns this code into PIR code. PIR is the native high level language of Parrot. Inside Parrot, the PIR compiler (IMCC) turns PIR into Parrot bytecode. As part of that process, IMCC identifies constant string literals and treats them specially.
Like the Perl 6 code, the PIR code produced by Rakudo contains the string literal fred
twice. The PBC produced by IMCC doesn't; it refers to a single internal data structure twice.
This is usually the right approach. In this case, where the literal string appears twice and is only four characters long, there's little benefit, but in a complex program, you can save a lot of memory and time with judicious caching.
Now of course sometimes people want to mutate these strings. They're mutable; you can change them. That's where the COW comes in. It's like memory handling on a decent operating system. You only make a copy of the memory at the last possible point, where you know you're going to modify your copy. Parrot strings support this, so if you use Parrot operations directly, you don't even have to know that COW exists. It just works.
The problem was that the string modification took place outside of Parrot, in a custom Perl6Str PMC. Think of a PMC like a class which represents internal data structures, and you're most of the way to understanding them. The Perl6Str PMC has two operations, increment
and decrement
which do exactly what you'd expect to strings on the C level. This means that they modify the C string directly.
Because this occurs at the C level (working directly on C pointers), Parrot doesn't have a chance to perform the copy-on-write operation to the string, and the modification of one string produces the modification of all other strings which refer to the same string literal.
My first solution was to call the Parrot string function to perform the copy (because there's a write coming up) directly, but that made too much code move around (C89 and declarations before code, grr). Instead, I made a two-line macro which does an in-place copy and assign, and only two lines of code had to change to do the right thing. Now the code prints, as it should:
fred fred
(I spent more time writing this entry than I did fixing the problem.)
Why C89? Seems silly now.My first solution was to call the Parrot string function to perform the copy (because there's a write coming up) directly, but that made too much code move around (C89 and declarations before code, grr).
Re:C99 FTW!
chromatic on 2008-04-21T20:21:04
I'd love to be able to use C99, but some of our target platforms lack C99-conforming compilers. As I understand it, we're sometimes fortunate that Visual Studio supports C89.