Returning to the trees

TorgoX on 2001-11-08T07:43:42

Dear Log,

I get lots of mail about all the modules and articles that I have written. The most recent ones were: 1) someone thanking me for Sort::Naturally and adding that of course people want things sorted like Sort::Naturally sorts! [my muttering: sometimes they want it sorted really fast tho, so you're screwed, because Sort::Naturally is always going to be much slower than even Sort::ArbBiLex -- but thanks anyway]; 2) someone asking if I could write a RTTTL-to-MIDI converter and maybe put it in the MIDI-Perl dist (unlikely, but you never know), and 3) a request for advice from someone a bit new to recursive stuff and maybe to markup language processing, but who has to do some fairly complex stuff that I can only imagine doing with a traverser (a tree-recursor) -- but he's not at home with traversers.

Explaining traversers isn't so easy. I tried valiantly in a long response, of which the following is the last bit.

...The trick is to imagine _whammo_traverser calling itself on root and that INSTANCE of _whammo_traverser calling itself on a child of root, and that INSTANCE calling itself on a grandchild of root. Each instance of the routine being called on a node has its own copy of all sorts of variables and things -- notably its own @_, its own $this (copied from $_[0], with a "my"), and its own placeholder in its "foreach my $child ($this->content_list)" expression. So, where "you" are is $this (or $child) -- for appropriate values of "you". And when you call another instance of yourself, all the "my $this" variables are yours and yours alone -- the other instances have their own copies. (Otherwise you couldn't ever do a recursive routine, if attempts at recursion left different instances stomping on eachother's variables.)

The way to make this fit in your head is to think about only one instance at a time: put yourself in the shoes of a given instance of the sub and just limit yourself to figuring out what to do with the one node in front of you, and whether/how to call yourself on the child nodes, and maybe whether anything needs to be done with the current node once the calls on the children return. The traverser is just a matter of saying "what do I do when I have an 'a' element/node in $this, what do I do when I have a 'td' element/node in $this, etc, and what do I do otherwise?".

Getting the hang of recursion and traversal is one of the hardest things I've done (or had happen to me). Like op-art and base-N number systems, some people pick it up easy, some people never really get it all the way, and some people take a while to get it (for many and vague meanings of "get it").
I wrangled with recursion/traversal all while writing Tree::DAG_Node, to which HTML::Element is sort of an evil twin. (And then a year or two after the idea was solid in my head, I went and learned about efficiency tricks, and how to implement a recursive algorithm without a recursive routine. My general verdict on that is: don't try.)