Yet *another* XML Module?

Ovid on 2005-09-30T20:08:47

See also the Perlmonks posting.

I can't find the link right now, but Joel Spolsky once wrote about core business needs. If you need something to be done perfectly and it's a core function of your business, do it yourself. Don't outsource it or use someone else's code. While I am not that strident (I wouldn't dream of writing an alternative to DBI), I have to confess that the slough of XML modules out there are often painful to use. Just painful.

I know many would rather not see another module in the XML namespace, but I've started putting together a proof of concept called XML::Composer (it was originally called XML::JFDI). It will let you write really, really bad XML. It will also allow you to write XML "variants" like the awful Yahoo! IDIF format. Namespaces? It doesn't know about 'em or care about 'em. If you want to use a colon in an attribute name, go ahead. Want to inject an XML snippet from somewhere else? Go ahead. It will simply do whatever you tell it to do.

The primary idea is to be able to write XML in just about any format you want. Frequently this means that it will cheerfully produce bad XML, but rather than spend hours scouring the docs only to find yourself reporting another bug, you can quickly and easily tweak what you want.

use XML::Composer;

my $xml = XML::Composer->new({
    # tag         method
    'ns:foo'  => 'foo',
    'bar'     => 'bar',
    'ns2:baz' => 'baz',
});
$xml->Decl; # add declaration (optional)
$xml->PI('xml-stylesheet', {type => 'text/xsl', href="$xslt_url"});
$xml->foo(
    { id => 3, 'xmlns:ns2' => $url},
    $xml->bar('silly'),
    $xml->Comment('this is a > comment'),
    $xml->Raw('');
    $xml->baz({'asdf:some_attr' => 'value'}, 'whee!'),
);
print $xml->Out;
if ($xml->Validate) { # false for above example
}
__END__



  silly
  
  
  whee!

This module (not yet on the CPAN) assumes that you, the programmer, know what you're doing. It's also designed for agile development: produce something quickly but have a test suite to verify that your output is correct. In short, there is a very specific design philosophy here and I wouldn't recommend it for those who don't have test suites, but it makes coding very fast (and no, it doesn't use AUTOLOAD to build those methods).


Aaaargh!

Dom2 on 2005-09-30T21:46:20

If it's not well formed, it's not XML. Period. Please don't encourage such vileness into the world.

At least, if you do, don't put it into the XML namespace because that's not what it's producing.

Sorry to come across mad about this, but dealing with invalid XML (and worse, SGML) over the last 5 years has made me bitter and twisted.

For your well formed XML generation needs, I'm open to suggestions as to how I can improve XML::Genx (within the constraints of the underlying library).

-Dom

Re:Aaaargh!

Ovid on 2005-09-30T22:01:26

You really want to see the Perlmonks thread on this (I linked to it in the parent story). XML::Composer can easily produce valid XML and, to be honest, it produces XML much easier than most of the XML modules out there. Rather than slapping my hand when I use a namespace which is illegal or upper-case tags or improperly escaped data (shudder), it trusts me to really mean what I say. However, the fact is that we often have to deal with bad XML and there's no way around it. I hate it. You hate it. We have to live with it (don't even get me started about IDIF crap). At the very least we should stop forcing people to "roll their own" because they're forced to work with invalid XML.

I am amenable to changing the name, though. I was thinking XML::Variant, XML::Loose or XML::IAmSickAndTiredOfPseudoXMLVariants.

Re:Aaaargh!

Dom2 on 2005-09-30T22:12:51

sigh. I certainly see the need. It just makes me cry. :-)

And I apologise; I should have read the perlmonks thread first.

As to the name, how about XML::NotWellFormed? It's the most accurate description even if it is an oxymoron.

-Dom

Re:Aaaargh!

Ovid on 2005-10-01T00:07:17

I'm leaning towards the name XML::Variant. While I sort of like your name, it really has a strong negative connotation and I don't want to dissaude the poor programmers having to work with bad XML.

Re:

Aristotle on 2005-10-01T09:42:46

How about XML::Almost or XML::Mostly?

or...

phillup on 2005-10-03T18:01:33

XML::Supposedly

or

XML::Supposed2B

Re:

Aristotle on 2005-10-01T09:57:58

I still don’t like the idea, even as I understand the predicament. I would suggest you use a templating system instead of writing a module for this. Text:Template and the Template Toolkit can easily produce arbitrarily complex and arbitrarily broken XML output.

(Oh, and please get in touch with the people who’re asking broken XML from you and call them bozos. Not offensively, of course.)

Re:

Ovid on 2005-10-01T20:26:41

Believe me, I already had a phone call with a Yahoo! rep. He was very apologetic but there's not much I can do as a lone developer to shove Yahoo!

XML::Generator?

LTjake on 2005-10-01T01:28:04

A friend and I had a similar discussion about generating XML in a simple way today -- we trawled CPAN and found XML::Generator. Does it do (most of) what you're looking for?

Re:XML::Generator?

Ovid on 2005-10-01T01:53:59

I had looked at XML::Generator and I liked it, but it had some problems. First, because of autoload, it's easy to do this:

print $xml->feild('foobared');

I probably meant "field". My version forces you to map methods to tags and will die if you try to print a tag that doesn't exist (though you can add methods/tags on the fly).

Also, as far as I can tell, I would not be able to conveniently dump out data in the Yahoo! IDIF format. Here's an snippet:

<?xml version="1.0"?>
<IDIF>
  <DOC url="http://www.example.com/radio/2005/09/28/fall05_voll11_2">
    <CONTENT type="text/html">
      <html>
        <head>
          <title>Volume 11 Fall 2005: Show 2</title>
        </head>
        <body>
<h1>Volume 11 Fall 2005</h1>
<br>
Hi there!  This is bad XML &-/
<br>
        </body>
      </html>
    </CONTENT>
  </DOC>
</IDIF>

That is not even remotely well-formed XML (note the unclosed "br" tags, for example), but it's perfectly valid for Yahoo!'s IDIF format. Trying to produce a bunch of stuff like this with most XML modules is what finally led me to start writing my own. From what I can see, XML::Generator will not allow me to write that.

Re:XML::Generator?

Aristotle on 2005-10-01T09:52:39

I probably meant “field”. My version forces you to map methods to tags and will die if you try to print a tag that doesn’t exist (though you can add methods/tags on the fly).

What if you wrote the correct tag name, but it gets inserted in the wrong place? What if the content is bad or a required attribute is missing?

Of course, a schema can only be used to validate well-formed documents…

Re:XML::Generator?

mir on 2005-10-01T15:28:41

This is not XML, so really it should not be labelled as such. AFAICT it is pretty close to being SGML though. It might even be SGML, the XML declaration would be seen by an SGML parser as a regular PI, you can add a DTD when you parse the file, and unenclosed tags and &\W are valid in SGML. You just need the characters to be in latin1.

So why don't you go and pollute the SGML namespace ;--)

Seriously, I don't think you should release your module in the XML namespace. An IDIF module would be OK, it its own namespace I guess, maybe with a helper module that could be used on its own.

I think my main problem with your module is the way you seem to advertise it, which sounds a bit like "let's generate more quasi XML to p.o. real XML guys" to me. Presenting it as a module that generates XML, documenting prominently the features that let it generate proper XML, and then adding a "There be Dragons" section that adds a few non-kosher methods (Raw seems like a prime candidate), maybe only available by adding an option to the constructor, might be a better way to go about it.

Oh, and as far as tests go, if you (not you, the user in general) don't really know what is and is not XML, I am not sure about the quality of the tests that you can generate.

Does it make sense?

Re:XML::Generator?

Ovid on 2005-10-01T20:24:19

I have decided not to put this in the XML namespace. That much I agree on.

I think my main problem with your module is the way you seem to advertise it, which sounds a bit like "let's generate more quasi XML to p.o. real XML guys" to me.

I can see how that might appear to be what I was doing. I'll be sure to clarify that. Unfortunately, this is a real problem space that developers constantly face: legacy XML "variants" or third-party resources which require malformed XML. Since it's not always possible to force others (Yahoo!, for example) to conform to the spec, we have to have some flexibility here.

One thing I think I will need to do is include a section on "testing tips". Testing poorly formed XML is difficult. I've a few pointers I can offer on that score. Fortunately, I've been writing it so programmers can have complete control over the output, down to the ordering of attributes (passing them as an array reference in lieu of a hash reference, if desired).

Re:XML::Generator?

mir on 2005-10-02T00:21:48

down to the ordering of attributes

I feel your pain, having had to deal with exactly that request for XML::Twig (not a model of XML purity itself): apparently some Microsoft tool needs attributes in a specific order. The easiest solution I found was to use Tie::IxHash objects to store the attributes.

Re:XML::Generator?

foobah on 2007-06-24T17:37:22

Also, as far as I can tell, I would not be able to conveniently dump out data in the Yahoo! IDIF format. Here's an snippet:

That is not even remotely well-formed XML (note the unclosed "br" tags, for example), but it's perfectly valid for Yahoo!'s IDIF format. Trying to produce a bunch of stuff like this with most XML modules is what finally led me to start writing my own. From what I can see, XML::Generator will not allow me to write that.
I'm sure the usefulness of this information is long gone for you, but XML::Generator was designed to allow generation of invalid XML when necessary. You just didn't look hard enough... ;-)

use XML::Generator ();

my $idif = XML::Generator->new(pretty => 2, conformance => 'strict');
my $ugly = XML::Generator->new(escape => 'unescaped', empty => 'ignore');

print
$idif->xml(
  $idif->IDIF(
    $idif->DOC({url=>"http://www.example.com/radio/2005/09/28/fall05_voll11_2"},
      $idif->CONTENT({type=>"text/html"},
        $idif->html(
          $idif->head(
            $idif->title("Volume 11 Fall 2005: Show 2"),
          ),
          $idif->body(
            $ugly->hi("Volume 11 Fall 2005"),
            $ugly->br(),
            "\nHi there!  This is bad XML \&-/\n",
            $ugly->br(),
          ),
        ),
      ),
    ),
  ),
);

Re:

Aristotle on 2005-10-01T10:01:09

In Defense of Not-Invented-Here Syndrome

Of course, Joel is fun to read because he writes a lot more confrontatively than the subject really warrants. I wrote something related recently.

Not more bad XML

iburrell on 2005-10-02T22:40:54

What we need is less bad not-quite-XML and more good XML. For generating good XML, the libaries that guarantee producing correct XML are the way to go. My impression is that they are easier to use than the not-so-nice ones because you don't have to worry about screwing up. They do things like always encoding strings, always using UTF-8, always worrying about namespace, and complaining about improperly nested tags.

If you have to generate not-quite-XML or bad XML, then you have a bigger problem. I am not sure we need another library to make generating that easier. And makes generating good XML harder.

I have been seeing links to HOWTO Avoid Being Called a Bozo When Producing XML recently.

Re:Not more bad XML

Ovid on 2005-10-03T01:06:48

But there's plenty of bad XML out there already and there are programmers who have no choice but to implement it, particularly if it's a third party requirement. Usually this bad XML tends to cause plenty of problems. Why have even more problems by creating yet another hand-rolled module which may or may not do what you want? Programmers in this unfortunate situation should at least be able to get the job done and not waste time having to reimplement something.

The good thing about Data::XML::Variant is that it can produce perfectly valid XML. Thus, if you are lucky enough to be able to migrate from malformed XML to good XML, you already have code in place which can do that. Yes, there are those who would argue that you should then switch to modules which validate the XML, but ignoring issues of cost and time constraints is a habit that many programmers tend to get into.