Simplicity Check

darobin on 2002-02-26T12:55:07

This must be the sixth of seventh time that I hear smart people call SAX complex. In the case of some techniques, I can understand why some would find them complex. But in the case of SAX it totally evades me.

SAX requires the following knowledge to be used effectively:

  • a minimal working understanding of XML. This is required because when using SAX, one is after all processing XML, or XML-like data. By minimal I mean elements, attributes, and character data. No rocket science here.
  • sufficient understanding of Perl to create about five methods.
  • sufficient understanding of Perl to use hashes.

And that's all. Yes there is more available from the spec. There are several Handler types that one can use when it can be more convenient to dispatch various events to different classes. But you can simply use the default ('Handler') and forget about the others. Yes there are events that can be used to express the more obscure parts of XML but either you know what they are and thus how to use them, or you don't and you can freely ignore them. I ignore them in 98% of the cases. That's why the spec has been split into Basic and Advanced chapters. Most people only read the Basic part, and are happy with that.

And for the very few rough edges (choosing a parser, building long pipelines...) there are helper modules. There also are quite a good bunch of articles on the subject at http://www.xml.com/pub/q/perlxml that really explain things in simple terms (thanks Kip!).

So is there something that triggers dummy mode in otherwise brilliant people? Is there some magic potion that makes it simple for some of us?

I'm not ranting. I simply feel at loss.


SAX is too ornate

pdcawley on 2002-02-26T13:53:45

It's not that I don't understand SAX. It's more that I don't like it. For the vast majority of cases, SAX provides far more behaviour than is necessary and imposes far more constraints on the programmer than is tasteful.

You've said the same thing yourself in your post; you describe the subset of SAX that one needs in order to use it. Well, if that's all we need, what on earth is the rest of it doing there? The pipeline initiative is, at one level, about encapsulating the fundamental things about the Pipeline pattern and nothing more.

I have applications where thinking in terms of XML documents and SAX events simply gets in the way of doing what I want, with no possible benefit accruing to me for doing so. I have applications where I expect large chunks of the functionality will move onto the 'thing' passing through the pipeline.

SAX provides a really powerful toolkit for messing with XML and XML like things I don't deny. And you can, of course, retool it to do whatever you like. But I can retool sendmail to act like a Universal Turing Machine and I've seen folks who use Excel as a wordprocessor. Just because something is possible doesn't mean that it's a good idea (though the Hack Value of a UTM in sendmail.cf is very high).

I want to take a pipelining module off the shelf, modify a few of the data structures to suit the class of problems I'm dealing with, maybe tinker with the mechanisms of transaction dispatch and just use it. What I don't want to have to do is wrap my head around a vocabulary from a domain that is alien to the problem domain I'm working in. Reworking the problem to fit the tools is the Wrong Thing, reworking the tools to fit the problem is the way forward. Pipeline encourages this, SAX makes too many assumptions and defines too many behaviours to be useful for me.

Re:SAX is too ornate

darobin on 2002-02-26T14:25:03

For the vast majority of cases, SAX provides far more behaviour than is necessary and imposes far more constraints on the programmer than is tasteful.

Can you list examples? I use SAX daily and have yet to feel constrained, tasteless, and over-behavioured.

you describe the subset of SAX that one needs in order to use it. Well, if that's all we need, what on earth is the rest of it doing there?

That's all most need. It's the usual 80/20 thing, except in this case it's closer to 95/5. Using 5% of the spec provides for 95% of one's needs. The remaining stuff is there so that if one day you wake up and realise that you just *have* to process NotationDecls, it's there for you. That's why it's conveniently packaged away in another part of the spec, and labelled in large friendly letters "ADVANCED: DON'T PANIC"

I have applications where thinking in terms of XML documents and SAX events simply gets in the way

Well.... don't use it in those cases then! I fail to see how a given technology not being applicable to *everything* would qualify it as a "horrible, truly gruesome, overcomplicated monstrosity". My bottle opener can't bake coffee. It's nevertheless rather simple, and I happen to like it.

I still fail to see how and why you jump from SAX not being the solution to the generic pipeline system (I don't recall anyone saying that) to SAX being horrible. SAX is a pipelining system. It's also a clean and extremely simple model for the pipelining of structured and labelled data. I think people have been at most suggesting that pipelining efforts learn from one another.

How SAX not being adapted to that specific problem makes it complex and gruesome, again, evades me.

horrible, truly gruesome, overcomplicated monsters

pdcawley on 2002-02-26T14:55:15

Ahem. I did go a little over the top there didn't I?

I'm guessing that those of you who worked on implementing SAX might take it a little personally...

I'm not sure if it is deliberate or not, but some of the evangelism I've seen about SAX seems to imply that it is the answer to every prayer when, on closer inspection, it patently is not. This does tend to make folks a little wary.

Re:horrible, truly gruesome, overcomplicated monst

ziggy on 2002-02-26T15:17:40

I'm not sure if it is deliberate or not, but some of the evangelism I've seen about SAX seems to imply that it is the answer to every prayer when, on closer inspection, it patently is not. This does tend to make folks a little wary.
I suggest you spend a little time to crack the nut that is XML. Ignore the hype and try and see why bracketheads like us see value in focusing on processing the data and not on brute-force programming to solve a problem. It's not so much that SAX is the solution to world hunger, but that thinking in XML makes certain solutions simpler, and SAX makes coding those solutions simpler (once you force yourself to wrap your brain around a gold brick or two). Sadly, the only way to see why XML might possibly be a universal technology is to drink the kool aid. This is our fault on many counts.

I also will quite readily concede that XML is not the universal problem solver, the mythical Sonic Screwdriver (platium plated and laser-etched Dean Kamen, of course), nor is it the babelfish, the ideal replacement for MP3/QuickTime, Perl or Windows. But that appears to be a minority brackethead POV. Most sensible bracketheads agree with this, but somehow this message isn't getting the required amounts of volume to overcome the hype.

Re:SAX is too ornate

ziggy on 2002-02-26T14:41:56

You've said the same thing yourself in your post; you describe the subset of SAX that one needs in order to use it. Well, if that's all we need, what on earth is the rest of it doing there?
We have a saying in the XML world: «One program's data is another program's metadata»

The most common use of XML is to encode data, and in those circumstances, the only thing that's needed are access to start/end elements and data. But there are programs out there that create XML, or do more involved processing of XML (processing instructions, insignificant whitespace removal, comments, etc.). So, rather than have two different APIs -- one for the common case, and one for the heavy-duty case -- we have one where 80% of the API is ignored for 80% of the problems out there.

Perhaps this is a bug in the way SAX is described to 80% of the audience, since people writing SAX implementations tend to be concerned with the 20% of the problems that need the other 80% of the features. :-S

What I don't want to have to do is wrap my head around a vocabulary from a domain that is alien to the problem domain I'm working in.
This, in my estimation, is the worst problem XML has created for itself.

Re:SAX is too ornate

darobin on 2002-02-26T15:01:39

This, in my estimation, is the worst problem XML has created for itself.

Indeed. I remember the days when we would have to try to convince people to use XML for problem foo and they weren't seeing the benefits. Nowadays it seems that I more and more recommend that people do not use XML for given problem bar, because they're only using a buzzword and in fact still not seeing the benefits.

We have now reached a point where the marketing hype and alphabet soup is obscuring the truly good bits, and that's probably XML's biggest problem :-(

I guess the problem that we're facing now is to switch from propaganda to education, ie become a mature community to follow the mature technology.

As Matt pointed out recently, XML is going to become more and more boring as people learn how to use it properly and it gets to sit in its rightful place (ie structured ASCII). I can't wait for that time to come :)

SAX == XML

ziggy on 2002-02-26T14:24:29

I have to agree with what barrie said elsewhere. SAX is complicated because XML is complicated, and lots of programmers dislike SAX because it is both XML based and event based, and picks up a good deal of the animus against XML by association. Tree-based APIs are less prejudiced because they're not event based and take the point of view that "XML is basically a data structure." But they still get tainted with some of the hatred against XML.

Even hard-working XMLheads dislike SAX. The Cocoon project took the typical path of starting out with a DOM framework and then rewriting the whole thing in a SAX framework (1) because it was slow and (2) because it wasn't that great. :-) To that I'd say that hard-working get-the-job-done kinds of programmers don't deal with event handlers very well, or only in certain kinds of environments (GUI programming, especially with VB, where it's clear that there's a user driving things, not a stupid little data file).

Also, it's much easier to visualize a brute force solution using a tree-based API, where you have a set of lexicals nearby. Compare that to SAX, where a certain amount of design and elegance are required to write filters (regardless of the fact that the elegance and design work is already done for you with XML::SAX::Base...). Not that great when you just want to get stuff done without petitioning the gods to get off the ground...

SAX interface question

TorgoX on 2002-02-26T19:59:30

Can one use SAX without writing a class and methods? I.e., can one just give a set of callbacks, as one can with HTML::Parser v>=3 ?

Re:SAX interface question

Matts on 2002-02-27T07:21:39

Sure. So long as you call those callbacks MyPackage::start_element, MyPackage::characters, MyPackage::end_element, and then just pass in bless({}, 'MyPackage') as your handler.

Re:SAX interface question

TorgoX on 2002-02-27T07:37:15

Ugh.

Re:SAX interface question

pdcawley on 2002-02-27T08:30:34

Um. Whilst I'm still not entirely convinced about SAX, what's your problem with doing it OO? Allowing both functional and OO styles starts to lead us gently down the road towards the madness that is CGI.pm

Of course, you could always subclass the basic XML::SAX::Filter so that you pass it callbacks on instantiation...

Re:SAX interface question

TorgoX on 2002-02-27T18:05:19

Your analogy to CGI.pm is specious.

Re:SAX interface question

darobin on 2002-02-27T18:29:01

I don't think it is, at least not completely. If we are to support what can be done with methods using callbacks (notably transparent filtering and the such) then down that path quite clearly lies madness.

If you want a SAX handler that uses callbacks, then you most definitely can. Why one would want to do that, I really don't know as:

  1. handlers don't need to subclass anything (you only subclassing when you must create or propagate events, ie in a driver or a filter)
  2. you don't even need to declare your subs as callbacks so long as they have the right names (easy).

So it's in fact easier than HTML::Parser with callbacks... Just code a constructor (doesn't need to do anything more than return an object) and code the methods you want in the list of those you can use, and only those. If some don't exist (as is almost always the case) the driver won't call them.

Re:SAX interface question

TorgoX on 2002-03-07T23:46:33

If we are to support what can be done with methods using callbacks, (notably transparent filtering and the such) then down that path quite clearly lies madness.

What part of it is madness?

Re:SAX interface question

darobin on 2002-03-08T14:32:08

In SAX you can chain filters to no end, without needing to care about which of them define which event handling methods. If you have Parser -> FilterA -> FilterB -> Handler, and FilterA doesn't wish to deal with, says, the start_element events, FilterB will nevertheless receive them (and so will the Handler, unless of course one of the steps catches them and refuses to propagate them). This scales to any number of Filters set up in a Pipeline, which makes such pipelines much easier to setup and more manageable than any system implemented with callbacks than I can think of (but I could be missing something, if you're thinking of something in particular I'd love to know).

In any case, as a formerly intensive user of HTML::Parser, I do find SAX to be simpler.

Re:SAX interface question

barries on 2002-02-27T18:20:27

Yes, that's what XML::Filter::Dispatcher provides. It takes a pattern matching language (an XPath subset) and allows you to map matching events to subroutines or other SAX handlers.