If you say you're going to take Vienna, use XML

Ovid on 2004-10-18T19:01:12

There's a relatively famous quote, attributed to Napoleon, that reads "If you say you're going to take Vienna, take Vienna." There's a lesser-known quote, attributed to Ovid (me, not the poet) that reads "If you say you're going to use XML, use XML." Not this bastard hybrid of broken CSV and XML:


<20020110>
5, 44, Some Movie, 48
5, 69, Another Movie, 63
5, 91, Not Another Movie!, 55

Now guess what happens if we receive data in this format with a comma in the title? If the software building this file quotes the title, it breaks. If it doesn't quote it, it runs fine. Why? Because we parse each line with:

$line =~ /^(.*?),(.*?),(.*),(.*)$/;

Dear lord.


CDATA?

kungfuftr on 2004-10-18T21:06:42

Can you not use CDATA sections to encapsulate raw data within your XML?

Re:CDATA?

rjbs on 2004-10-18T23:23:00

You could, but why? It makes sense if you want to embed data that XML is not good at encoding: audio, video, and other blobs. If you want to store simple, serialized data, and you're using XML for some of it, why not use it for the rest? XML parsers have been written already, and the good ones have been extensively refined and tested. The code here was using XML to store a bit of data that could've been XML. Instead, it used a different format that required a different parser. Then, that parser was crap. If it had just been XML, it could've been parsed like everything else.
And, failing that, it could've used a well-tested CSV parser -- but now you've got two big parsers solving the same problem.

Vienna &amp; Napoleon

domm on 2004-10-18T21:14:15

There's a relatively famous quote, attributed to Napoleon, that reads "If you say you're going to take Vienna, take Vienna."

When I cycle to my fathers place, my route takes me through the Lobau and past a small memorial called "Napoleonstein" (or something). It marks the place where Napoleon stayed with his army during the Battle of Wagram (when he took Vienna). It is basically in the middle of nowhere, with lots of alluvial forest and mosquitos (in summer) around.

All of which has nothing to do with Perl or Ovids post, but is just a curious coincident I'd like to report before going to bed...

Embedding CSV in XML

drhyde on 2004-10-19T08:36:05

I found myself putting CSV in XML recently. I'm not sure whether to blame the MS/.net programmer at the other end of the notwork connection, or the piece of shit that is SOAP::Lite, but it was pretty much the only way that we could pass an array of data as a reply to a SOAP query and have it work going both ways. But at least we used proper CSV, not just join(',',@data).

Sub-Atomic Parsing

ziggy on 2004-10-19T16:29:15

On the one hand, if you're using XML, you want to parse one syntax and be done. The reality is that's more fantasy than fact.

The problem you're trying to solve here is colloquially known as "parsing the atom". That is, the XML tagging suffices to mark up the structure of some blob of data, but doesn't scale down to parse the data fragments inside that blob. For example, consider a nice XML document that has an attribute or a text block that contains a date. Is 'Tue Oct 19 12:19:16 EDT 2004' a chunk of text like 'Fred Flintstone', or a parsable date? And if it is a parsable date, what rules do you use to parse it?

What about ISBNs? URLs? LaTeX fragments? Quoted HTML? Perl code? English sentences? PNGs?

On the one hand, it's a hard problem. You could solve it and use XML as your one and only syntax, but that's paying an extra 800% for an additional 2% of value to the programmer (who generally needs to solve this problem once and move on).

But on the other hand, throwing a CSV island in the middle of a sea of XML is like bringing a bottle of Bud Lite when touring the Guinness brewery just so you have something to drink. It might be a logical and defensible choice in some universe, but not this one.