A new XML book and more XML modules

toma on 2003-01-04T18:41:12

XML and Perl
Received and read part of "XML and Perl" (New Riders). This isn't a book review, leave that to people that understand the subject better than I do. These are just my notes!

The book is useful but does not provide much new info for me. It spells a few things out clearly that are otherwise hard to figure out, with line-by-line code walkthrough.

Here are a few of the gaps:

  1. It says that XSLT can be used to transform XML into CSV and other non-tagged formats, but the example code just shows the usual XML to HTML translation.


  2. The SAX code has a bunch of case statements in the handlers for which kind of tag is being processed. I would prefer to see this coded another way with SAX. I try to avoid big case statements.
I liked the section on XML Schemas, since I didn't know anything about them.

I can't say that the book helped with today's particular problem of interest.



Trying SAX Modules
Installed Pod::SAX with cpan. This has many dependencies, including XML::SAX::Writer. Installed okay. Pod documentation for the functions is missing. This style of code doesn't look like the kind of code that I like write.



Installed XML::Generator::DBI with cpan. It failed tests looking for DBD/Pg.pm, there is some manual configuration that needs to be done. I didn't pursue this further because I have no database on this machine. The code is interesting to read though, both as an example of using XML::Handler::YAWriter and a nifty flexible DBI query.



Other activity
I installed psh and fooled around with it. It looks like fun, but possibly dangerous since I don't know what I'm doing. My shell needs to be very reliable, eg rm commands.

I installed File::List, and wrote and example program using File::Flat with it. I posted this snippet as an answer to a question at perlmonks.


On SAX

Matts on 2003-01-04T21:55:22

The SAX code has a bunch of case statements in the handlers for which kind of tag is being processed. I would prefer to see this coded another way with SAX. I try to avoid big case statements.

That's pretty much just how SAX works I'm afraid. It has a tendency to contain a large case statement with things that say: "If tag is this do this. If tag is that do that."

I'd welcome alternate ways of designing SAX handlers though. Unfortunately that tends to be the way data driven code works.

Trying SAX Modules

Installed Pod::SAX with cpan. This has many dependencies, including XML::SAX::Writer. Installed okay. Pod documentation for the functions is missing. This style of code doesn't look like the kind of code that I like write.


It only requires three modules - XML::SAX, XML::SAX::Writer (for the tests - though potentially I could write the tests without this) and Pod::Parser - which should be shipped with your perl.

As for the style, mostly I copied other Pod::Parser based code. It tends to inforce a certain way of writing code unfortunately, and I find it a bit hacky. Though I would appreciate it if you could be more specific as it might help me clean up the module.

The docs suck, I know. I'll work on that. It's tricky because partly I want to say "Uses SAX, generates XML that looks like this", but obviously that's not enough for most people.

Installed XML::Generator::DBI with cpan. It failed tests looking for DBD/Pg.pm

Strange. I'll look into it.

there is some manual configuration that needs to be done. I didn't pursue this further because I have no database on this machine. The code is interesting to read though, both as an example of using XML::Handler::YAWriter and a nifty flexible DBI query.

Theoretically the tests shouldn't require any DBD driver but I'll look into that tomorrow. Thanks.

Re:On SAX

toma on 2003-01-08T05:46:18

I'd welcome alternate ways of designing SAX handlers though. Unfortunately that tends to be the way data driven code works.

I'm not really knowledgable enough about SAX to understand this design problem to make a good suggestion, but I won't let that stop me.

I have been using XML::Twig and XML::Writer with some success, and have enjoyed the XPath features in XML::Twig.

Instead of writing a large case statement, I usually use a dispatch table. Typically I use a hash of anonymous subs, where the key to the hash is the name of the tag or XPath specification. I like AUTOLOAD capability, although I only use it when I have no other way to know the name of a function in advance.

A dispatch table can translate XPath specifications into subs that load XML subtrees into compact data structures. This retains most of the small memory usage of parsers, while providing the ease-of-use of a DOM approach.

But really I'm just learning. I have worked with simple XML configuration files and 70MB XML CAD archives, and yet I feel like I am just getting started and I have little confidence.

One thing I am convinced of is that I need hierarchical data structures, and I am really benefitting from using XML and the associated paradigms. Thanks for making XML better, and helping me with my projects!

Re:On SAX

Matts on 2003-01-08T07:59:03

Ah, if it's XPath based dispatch tables you want then check out XML::Filter::Dispatch. It's XPath capabilities are even more powerful than XML::Twig's.

Re:On SAX

toma on 2003-01-10T09:04:55

I tried XML::Filter::Dispatch and had some success, but then I ran into a problem with it.

I posted a question about this at XML::Filter::Dispatcher string rules problem.

Huge case statements

pdcawley on 2003-01-05T19:40:31

The more I read about SAX programming the less I like it. I'd look at turning the element data structures into first class objects:

sub start_element {
        my $self = shift;
        my $e = ElementFactory->make_from_struct(shift);
        $e->start_with_parser($self);
}

And have start_with_parser do the double dispatch trick:

sub MyElement::docbook::para::start_with_parser {
        my $self = shift;
        my $parser = shift;
        $parser->start_docbook_para($self);
}

Of course, this can lead to a proliferation of classes, but careful protocol design can fix that somewhat, for instance, you might be able to have all your docbook elements share a class and do:

sub MyElement::DocBook::start_with_parser {
        my $self = shift;
        my $parser = shift;
        my $method = "start_docbook_" . $self->element_name;
        $parser->$method($self);
}

Combine all this with the State pattern and you have a powerful and flexible system that doesn't suffer from the 'huge case statement' problem. Instead suffers from the 'where does stuff happen? ' problem, but I happen to think that that's an easier problem to deal with because at least the things that happen have sensible names.

XML Filters might require a little more work, but a simple pass through AUTOLOAD should fix most of your problems (you'd need to map back from your protocol's method names to the appropriate SAX method calls, but unless you're handling some of the more arcane corners of SAX you can probably do that with a simple hash look up based on the first part of the method name -- start_* becomes start_element, end_* becomes end_element etc.)