There are a few topics that circulate throughout Perldom with some regularity. One of them is the lack of good support for XML.
I'm now convinced that this old saw is a complete myth. Sure there are issues with Unicode, and until recently no one thought it worthwhile to write an XML parser in Perl (now, there are two - XML::Parser::Lite, and XML::SAX::PurePerl). Realistically, the lack of in-depth Unicode support in Perl (at least pre-5.8.0) isn't hindering real work from getting done with ASCII-ified XML.
So, while we're all bemoaning the lack of innovation in Perl, Simon St. Laurent took a look at what we've been doing for the past few years and found some seriously innovative modules. (Sorry, the link is dead at the moment). He was particularly impressed with Michel Rodriguez's XML::Twig and Matt Sergeant's XML::XPath.
So, last week, I finally got around to looking into Simon's Regular Fragmentations for XML. The basic idea is that element (or attribute) content may be parsable in some native, non-XML syntax. For example, XML Schema describes ISO dates and times as fundemental datatypes, but how do you get the year out of that atomic datatype? Or, if a <style> tag contains a block of CSS declarations, how are the individual declarations parsed? Simon's answer is to parse the atom as it were -- in other words, apply regular expressions to element content.
In his slides, Simon describes the problem in greater detail, interactions with XML Schema, and possible approaches (such as creating an XML vocabulary to describe a regular fragmentation). He even points out that this approach, currently implemented in Java, needs more Perl hackers.
I'm glad Simon started down this path. He's been working on it for a while now. This type of work is so ludicrously obvious to Perl/XML hackers (running element content through a regex is at least as easy as running any other string through a regex), and heretical to some of the XML elite (all parsing shall be XML parsing).
Hopefully, someday soon, the myth that Perl and XML don't mix will be forgotten. After that, who knows? Maybe the myth that Perl is a write-only language will also be disproven once and for all. :-)