I've been running my GC in stop-the-world batch mode. What this means is that when a GC run is initiated, we always run through the entire algorithm. We find the root objects, trace all objects at once, and then sweep all pools. I modified the behavior (which is as simple as a simple macro change) to break in 4 places: After we trace the roots, after we trace the rest of the objects, after we sweep the PMCs, and after we sweep the sized pools. In more aggressive configurations, we can break up the tracing into multiple steps as well. However, I'm not playing with all sorts of possible configurations until I get the basics working to a much higher standard of reliability.
Adding in these 4 breaks does allow parrot to build without errors or warnings. This is probably because a program needs to run the GC 4 times before the final sweep is completed. Assuming the GC is only run when Parrot attempts to allocate a new PMC when there are none left, then Parrot can allocate 1315 PMCs before a GC run completes. All objects start black, so the first sweep shouldn't actually do anything except reset them all to black (this is an inefficiency I need to fix eventually) so the GC actually needs to run at least 8 times to complete the first real sweep. Most programs in the build process and in the test suite do not exercise Parrot enough to reach this point, and so most programs won't have any GC-related problems.
The test suite does reveal several problems, which I am slowly grouping into three distinct categories:
1) Tests that segfault when run in "make test", but run correctly when run directly from the commandline. For instance, "perl t/stm/runtime.t" segfaults on the fourth test, but "parrot t/stm/runtime_4.pir" completes correctly. There are a handful of other tests that behave like this too, so I think they might have a common problem.
2) I'm still having problems where PMCs appear to be prematurely swept and recycled. This seems to happen most often with IO-related PMCs. That doesn't mean the problem is restricted to ParrotIO pmcs, just this problem seems to manifest itself with them.
3) I don't think my GC properly implements the interface that some of the opcodes are expecting. Some tests are using things like the interpinfo opcode, which reads info directly from the arena_base structure. This is something I'm going to work on tonight to see if I can bring it up to better compliance.
So this is where I am now, and I'm slowly but surely tackling problems and getting them resolved (usually with chromatic's help). I just hope that I can keep up the forward momentum.