C is Not Perl

chromatic on 2008-06-01T05:15:15

Sometimes I run into bugs that no one else has ever heard of. (I fix some of them, which makes other people give me their weird bugs to fix. There's a reputation you probably don't want to get: weird bug fixer.)

Parrot has a nice little trick to share constants strings within the core; a macro called CONST_STRING() produces a Parrot STRING but doesn't consume an extra string header at runtime. A compilation-time tool extracts all such constant strings and writes a private header file which contains a C array of information needed to build a table of these shared strings. It also rewrites that macro such that the strings get looked up from the table.

In a hot path, using a constant string (where the string's contents are constant at compile time -- that is, constant in C source code) versus allocating a new Parrot STRING can give you a nice boost, mostly by avoiding the garbage collector. You probably notice a motif in my writings as of late.

This private header file contains several lines of struct declarations:

static const struct _cstrings {
    UINTVAL len;
    Parrot_UInt4 hash_val;
    const char *string;
} parrot_cstrings[] = {
    { 0, 0, "" },
    {13, 0x9e59fbe, "__parrot_core"},
    {4, 0x2cfdd1, "PASM"},
    {3, 0x154ba, "NCI"},
    {9, 0xf25a77c0, "_filename"},
    {9, 0x64254bb6, "_lib_name"},
    {5, 0x6f8c561, "_type"},
    ....
};

There was one subtle bug in the program which builds this table, until a few moments ago. Here's the problem:

my $len               = length $str;
my $hashval           = hash_val($str);
push @all_strings, [ $len, $hashval, $str ];

It's awfully subtle. Parrot r27987 might enlighten you.

When the processing tool reads the contents of CONST_STRING, it reads literal characters in the C source code. This is fine if your constant is ParrotString, which is obviously 12 characters long, but it's very wrong if your constant is \n, which is two characters long to Perl when read from a file but one character long to C.

This wasn't a problem because nothing used the len entry in the table members until I started to refactor the table initialization code. Problems ensued, until I realized that the previous code path which called C's strlen did something different from the new code path, which assumed that the extraction tool did the right thing.

Now the extraction tool does the right thing, and Parrot starts up almost imperceptibly faster. More importantly, the refactoring can continue.


I wonder...

rooneg on 2008-06-01T13:50:21

I haven't looked at this in any great detail, but couldn't there be a similar problem in the code that calculates the hash value? I mean isn't there some analogue to hash_val() inside parrot itself (presumably written in C) that needs to calculate hash values for other strings, and if the one in that file hashes \n as backslash and n, and the one in parrot does so as a literal \n wouldn't they get different values?

Re:I wonder...

chromatic on 2008-06-01T17:18:47

You're right. I hadn't thought of that, but fortunately nothing uses the hash value from the constant table yet. I had considered earlier today that it might be useless, removable data.