Found the looping bug in Mail::Thread, turned out to be a one-linefix.
Fed all of london.pm-2001 and void-2001 to mariachi, and found that the subject threading support I'd added is potentially suspicious - adding subject threading to the void mbox loses 35 of the 22296 messages from the thread tree (well actually it's less than 22296, since we discarded a few as duplicates). Again this only shows up in the big-ass datasets. Gah.
Found the Mail::Thread bug. It turned out to be quite simple - when messages are threaded out-of-order entire sibling branches were getting lost. Of course this is only happening for my big datastores because they're partly out-of-order. Fix turned out to be a fairly simple link walk.
Apart from it didn't. I think the diagnosis remains correct, since shuffling the initial dataset has an impact as to how many messages get rendered unreachable. Still need to identify the exact cause though.
I do know both subject threading and tree pruning are innocent, which is good, because I did a lot of hacking on those in ripping them off from Grendel, and I wasn't keen on poking through it again.
In the meantime I overhauled the Mail::Thread tests, and made a few teeny changes which uncovered a couple of bugs in the tree iterator and the ordering code. Glad to see those little niggles gone, though a bit puzzled as to why I didn't notice them before. Wood for the trees I guess.