XML::SAX::Base and Why You Should Care...

kingubu on 2001-11-22T08:46:18

Having pushed XML::SAX::Base out to CPAN yesterday morning, I feel compelled to fill in the detail promised in the last journal entry, and to mention why I think having a standalone version of XML::SAX::Base is important.

First, though, let me offer my sincerest thanks to both Robin Berjon and Matt Sergeant for their invaluable contributions to this module specifically, and for the XML::SAX dist in general. Gnat is right, XML::SAX is a major leap forward; and we have the combined talent, experience and commitment of these two exemplary uber-Camels to thank for it. You rock, guys!

If you don't know what SAX (Simple API for XML) is, you'll probably want to have a look at this column, first. Basically, though, SAX works on an event/handler-based model; which means that, as an XML parser makes its way through a given document, the contents of the various parts of the doc are passed by event callbacks to one or more handler classes that provide access to the data. Examples of those callbacks are start_document() to signal the beginning of the XML document, start_element() to signal the start of an XML element, characters() to pass text contents, and so on.

Now, here's the cool part: you don't need an XML parser to generate SAX events!. Using XML::SAX::Base as a base for your "driver" class, you just call the SAX event methods directly and viola-- a SAX event stream. You can think of XML::SAX::Base as the adapter layer between Perl's built-in data facilities and the myriad of data manipulation modules, parsers and all the other fun stuff on CPAN, and the XML World. All you do is figure out how the data you have in hand would be marked-up as XML, call the SAX events in that order (passing your data through, of course), and, as if by magick, you have translated your data to XML. (Okay, to be strict, you have to have a Handler class like XML::Handler::XMLWriter at the end of the processing chain to translate the event stream into an actual XML document on the filesystem, but you get the idea...)

Nice, huh? Wait, there's more...

XML::SAX:: Base is also a generic base class for SAX filters!

In a nutshell, a SAX filter is just an event handler that accepts the SAX events, does some cool thing, then forwards all or some of those events on to the next Handler in the chain. Your can add new events to the stream (create new elements), block certain events (remove elements), mangle the data contained by the characters() events, or anything else that you can imagine... What XML::SAX::Base provides in this case is that it lets you implement only the events that are relevant to the task at hand, without having to worry that a parser or other driver is going to forward an event to your filter that you didn't think of (or care about). All events not explicitly over-ridden in your filter class will be silently forwarded to the next handler-- no muss, no fuss, no embarrassing odor.

Hardcore XML Geeks will be delighted to find that XML::SAX::Base invisibly copes with the class->method mapping for drivers and filters using both SAX1 and SAX2 class names. Again, it "just works" so you don't have to. :-)

Performance wonks will be thrilled by the fact that all the filtery goodness provided by XML::SAX::Base comes at cost of very little extra processing time. My benchmarks reveal that filters implemented on Base add only 8-12% to the total processing time consumed over using just the given XML parser alone. When you consider every single event is being forwarded through the filter class, that's pretty damn impressive (even if i do say so myself :-)

Anyway, I'll pop back in and scribble some more when I have more time. If you are interested in the concepts mentioned here, have a peek here and here for more ideas.

One final note: the standalone distribution of XML::SAX::Base on CPAN is only really appropriate for those cases where you are using it as a driver for non-XML data. If you are (or are planning to) plug an XML parser into it, just install XML::SAX instead (an identical Base.pm to the one in the standalone dist also ships with the larger XML::SAX package, and you get all the other Good Things that go along with it).

-ubu