Garbage Collectors are nasty

Whiteknight on 2008-06-09T22:52:04

i've heard it said before that Garbage Collectors are nasty pieces of code. I'm not sure I believed that until today when chromatic informed me that some answers to my questions about stack tracing were locating in src/cpu_dep.c, if I was brave enough to go searching for them.

Here's a piece of code that I found in that file that required a double-take from me:

#if defined(__sparc) /* Flush register windows */ static union { unsigned int insns[4]; double align_hack[2]; } u = { { # ifdef __sparcv9 0x81580000, /* flushw */ # else 0x91d02003, /* ta ST_FLUSH_WINDOWS */ # endif 0x81c3e008, /* retl */ 0x01000000 /* nop */ } };

static void (*fn_ptr)(void) = (void (*)(void))&u.align_hack[0]; fn_ptr(); #endif

Any idea what this does? It took me a minute to figure out the nuances of it all myself. The goal of this snippet is to create a function using hand-coded assembly language instructions for a SPARC system. These instructions are stored into an array, which itself is part of a union. Why a union? That part is a little bit more tricky to understand, and adds to that nagging "you think you understand it except for that one little detail" feeling you get in the pit of your stomach.

The union forces the array of instruction code words to be compiled as if they were doubles. This means that the compiler will force the integer values to be aligned in memory in the same way that doubles are, at multiples of 8 instead of multiples of 4 (as plain-jane integer values would be). This is a requisite for function pointers.

The code above is, obviously, just the implementation of one particular function for SPARC, so what does it look like on everybody's favorite x86? Here it is, let's see who can figure out what this is doing:

Parrot_jump_buff env; memset(&env, 0, sizeof (env)); setjmp(env); trace_system_stack(interp);

I have removed comments from this code here so that people can really read through it and try to figure out what is going on. The system calls a setjmp, which is one of the more esoteric parts of stdlib. After it calls setjmp, it calls a function to trace the stack without passing the env structure! What's the purpose of all this?

setjump creates, in almost the most primative way possible, a continuation. To do this at the system level, C must store the current value of all the processor registers into a data structure (env, in this example). Now, the function declares env to be a local variable, and in C local variables are typically stored on the system stack. The net effect? The values of all processor registers, some of which might contain pointers to PMCs, are stored on the system stack for tracing. Pretty handy code, although I have several doubts that this function is going to survive aggressive optimization since modern optimizers are able to detect (and remove) variables which are written to but never explicitly read from.

Much of the work that I've done today on the GC is refactoring and simplification work.