Came back from the cinema and started to look at the memory problemagain. Rewrote Email::Folder to support an iterator interface, which got us to the point of running out of memory faster.
Had a mild lightbulb moment and realised the Email::Simple keeps so much more metadata than we really need, so that it can later regenerate an RFC2822 message. About 4x on a typical message. By switching Mariachi::Message to ape the interface expected by Email::Thread and populating the bits we need from an Email::Simple object, which we then throw away, memory overhead per message drops way way down to just the Mail::Thread::Containers and some extra navigation metadata we use.
Now we just have the problem that it's possible to create a thread tree so deep that you blow the stack and SEGV perl when you try and recursively walk it. Will have to go to an iterative walk I guess, but first I think sleep is in order.
Woke up. Dismissed Toms iterative tree walker as probably broken, and then about an hour later wrote almost exactly the same thing. Oh well. Rolled a tweaked version of mine into Mail::Thread and rewrote Container->order_children and chunks of mariachi to use it.
Wrote a new block of code for stranding. Stranding does two things -
1) it walks the threaded tree, setting prev
and next
properties
of the messages for the navigation. 2) promotes out really deep
subthreads up to the root level, to avoid blowing stack when
recursively walking, and a couple of display issues. In retrospect
they're both stranding, the latter strands poor little messages far
from home, but the name was given for the first meaning.
Thrashed for several hours debugging the tree walking and the stranding, unsure as to which was causing breakage.
Discovered a bug in Mail::Thread creating loops of children, which only seems to manifest in a stupidly large london.pm mbox (all of 2001, 15733 messages, 41M). Distilling that to a small test case will be such fun.