There's a relatively famous quote, attributed to Napoleon, that reads "If you say you're going to take Vienna, take Vienna." There's a lesser-known quote, attributed to Ovid (me, not the poet) that reads "If you say you're going to use XML, use XML." Not this bastard hybrid of broken CSV and XML:
<20020110> 5, 44, Some Movie, 48 5, 69, Another Movie, 63 5, 91, Not Another Movie!, 55 20020110>
Now guess what happens if we receive data in this format with a comma in the title? If the software building this file quotes the title, it breaks. If it doesn't quote it, it runs fine. Why? Because we parse each line with:
$line =~ /^(.*?),(.*?),(.*),(.*)$/;
Dear lord.
Re:CDATA?
rjbs on 2004-10-18T23:23:00
You could, but why? It makes sense if you want to embed data that XML is not good at encoding: audio, video, and other blobs. If you want to store simple, serialized data, and you're using XML for some of it, why not use it for the rest? XML parsers have been written already, and the good ones have been extensively refined and tested. The code here was using XML to store a bit of data that could've been XML. Instead, it used a different format that required a different parser. Then, that parser was crap. If it had just been XML, it could've been parsed like everything else.
And, failing that, it could've used a well-tested CSV parser -- but now you've got two big parsers solving the same problem.
When I cycle to my fathers place, my route takes me through the Lobau and past a small memorial called "Napoleonstein" (or something). It marks the place where Napoleon stayed with his army during the Battle of Wagram (when he took Vienna). It is basically in the middle of nowhere, with lots of alluvial forest and mosquitos (in summer) around.
All of which has nothing to do with Perl or Ovids post, but is just a curious coincident I'd like to report before going to bed...
join(',',@data)
.
The problem you're trying to solve here is colloquially known as
"parsing the atom". That is, the XML tagging suffices to mark up the
structure of some blob of data, but doesn't scale down to parse
the data fragments inside that blob. For example, consider a
nice XML document that has an attribute or a text block that contains a
date. Is 'Tue Oct 19 12:19:16 EDT 2004
'
a chunk of text like 'Fred Flintstone
', or a
parsable date? And if it is a parsable date, what rules do you
use to parse it?
What about ISBNs? URLs? LaTeX fragments? Quoted HTML? Perl code? English sentences? PNGs?
On the one hand, it's a hard problem. You could solve it and use XML as your one and only syntax, but that's paying an extra 800% for an additional 2% of value to the programmer (who generally needs to solve this problem once and move on).
But on the other hand, throwing a CSV island in the middle of a sea of XML is like bringing a bottle of Bud Lite when touring the Guinness brewery just so you have something to drink. It might be a logical and defensible choice in some universe, but not this one.