What C's Memory Management Gets Rightish

chromatic on 2008-05-24T18:28:07

In Double-Speed, I gave numbers for two Parrot optimizations that added up to a big improvement. Several people asked for explanations.

The most dramatic change is a refinement on stack chunk recycling, which I first explained in Refcounting Isn't All Bad. I've been thinking a lot lately about the difference between things which happen often versus things which happen rarely. Even though continuations at the VM level mean that control flow is not always a perfect singly-linked list of call frames, control flow is usually a singly-linked list of call frames. Though your call state can be a tree, in practice taking a continuation is rare.

In Parrot, taking a continuation is the only thing that will increase the reference count of a stack chunk. (I assume by now that you either understand continuations enough to know what I mean, or you can look it up for yourself. The important point is that a continuation captures a point in the call graph and keeps it alive as long as the continuation is alive. This allows the continuation to return to that point in the call graph whenever you invoke it.)

When I realized this, I removed all of the other refcount manipulations from the system. The only remaining refcount manipulation is for creating and destroying continuations. That's why Parrot revision 27567 gave a 30% improvement in the Rakudo-building benchmark.

Parrot revision 27568 is more interesting. The final diff chunks, for the isa vtable entry, tell the story. Given a PMC and the name of a class, is the PMC an instance of the class or one of its ancestors?

To answer this question, the function must get the name of the PMC's current class. That's the call to the get_string vtable entry.

The second chunk of the diff shows this function. It performs some calculations and caches the name, so there's a lazy-evaluation semantic here. Then it returns a copy of the name. (Don't worry -- it's a copy-on-write copy. I disabled COW once in Parrot for fun and, well, if there's one optimization we're keeping, it's that one. Yikes.)

Why?

Strings are mutable in Parrot. If you wanted to subclass a class at runtime, you might grab the name of the parent class, append a counter or other mangling to its name, and use the result as the name of the subclass. If you modified the same string that the parent class used in its name cache, you'd effectively change the class's name in place. (I should mention that strings are in two pieces. There's a string header and a string body. Headers are all of uniform size. Bodies are appropriate to the length of the string. This makes GC much easier in many ways and also makes COW possible.)

However, if you don't actually modify the contents of the string, you've just used up a free string header that you can't reclaim until the next GC run.

isa doesn't modify the contents of the string. It uses it only in a string comparison. There are no problems there. (Okay, that's a fib; the string comparison might change the encoding of the class name in place, but that's a safe operation, and actually a benefit -- lazy evaluation, again.)

By extracting the name-calculation operation out of get_string into make_class_name, that vtable entry became a call to the extracted function and then a copy on its results. The semantics stayed the same.

The isa vtable entry now calls make_class_name directly, avoiding the copy. Because isa is a hotspot, this reduces the amount of immediately recyclable garbage dramatically.

There is one twist, however. The PMCProxy PMC inherits its isa vtable entry from the Class PMC, but stores its name in a different manner. Thus the code needs to fall back on the get_string call if the SELF PMC is a PMCProxy. This is slightly ugly, and a better solution is to give PMCProxy its own isa entry (or fill it out more, as it exists in part).

Though I'm no big fan of C's memory model, it does have one useful semantic in the distinction between stack memory and heap memory. The implementation of that semantic leads to plenty of bugs, but it is useful to be able to mark the expected lifetime of a piece of memory. If we could do this more reliably in Parrot, we could ease even more GC pressure. There are correctness, conciseness, and ease-of-use penalties in that idea however.


Stupid Question

btilly on 2008-05-25T01:46:40

You say that upgrading the encoding of the string in place is a safe operation. I'm just wondering how well that works with multi-threading. After all one of the big problems that Perl 5 has with multi-threading is that if you're accessing a data structure while another thread is modifying it, you're hosed. And it proved impossible to find all of the places where this might happen in Perl 5. Hence our migration to a very heavy threading model.

I'm just checking that Parrot is not repeating this mistake.