Perl and XML sucks?

Matts on 2001-10-29T14:24:47

A couple of weeks ago Elliotte Rusty Harold, the author of several Java and XML books (including XML in a Nutshell, which I tech reviewed for him) put online Chapter 6 of his "Java and XML" book. On his site Cafe Con Leche, an XML geek site like use.perl, he asked for feedback, particularly with respect to the SAX support in other languages. What he said about Perl was, in my opinion, pure flamebait:

    Although supporting the "Desperate Perl Hacker" was a goal of the original XML working group, Perl has always lagged other languages quite a bit when it comes to XML. The initial problem was the lack of support for Unicode, a sine qua non for XML. Today modern Perls have decent Unicode support, though you will need version 5.005_52 or later to really handle XML; and I recommend version 5.6 or later. There are several XML parsers available for Perl, though far and away the most popular is Larry Wall and Clark Cooper's XML::Parser. This is a wrapper around James Clark's Expat parser for C. At the time of this writing, support for the SAX API from Perl is still in its infancy and limited to SAX1, though this may be upgraded to SAX2 by the time you read this, and Perl 6 will probably include it as a standard library. Regardless, in my opinion Perl is not as ideal a language for processing XML as you might expect. Perl is very good at pulling out implicit structure in text documents such as tab delimited text files and comma separated values. However, XML documents tend to have very explicit structure that is easily addressed by a language like Java. Consequently the inevitable obfuscation of Perl code seems to me too high a price to pay.
This coverage bothered me a lot, and so since I know Elliotte, I took it upon myself to get this put right before it goes to print. I pointed out where he was wrong, and accused him of using FUD where it was appropriate.

So, today I got a reply. He's corrected some stuff, but stands by his FUD:
    >Every language I can think of has easy ways to access complex data >structures. What's written there is pure FUD. Perl is no different to Java >in this respect. >

    No, it is. Perl's built-in regular expression support is significantly different than what most other languages provide. It makes Perl a much more powerful tool for programs that need to parse text documents, certainly more powerful and convenient than Java for this use case. That's strong enough to outweigh the disadvantages of Perl for these sorts of programs. But if this is not what you're doing (and when processing XML it isn't) then you get all the disadvantages of Perl with none of counterweighting advantages.

    > "Consequently the inevitable obfuscation of Perl code seems to me too high >a price to pay." > >It's possible to write crap in any language. I imagine you see Vietnamese as >obfuscated too (or maybe not, but the point being that Perl is simply >foreign to you). >

    I can think of no imperative language that lends itself to "writing crap" as easily or that's as hard to read as Perl. I keep hearing claims that Perl can be written cleanly, but every time I look at Perl code that's allegedly clean its initially incomprehensible. I can't even follow my own Perl code if I've put it aside for a week or so, something that is definitely not true of the code I write in other languages.
I'm not even really sure how to defend this beyond what I've already said. I'd really like some help in penning a reply. Comments open.


Incomprehensible

pudge on 2001-10-29T14:48:34

Either he cannot read Perl, which is his problem, or he does not read enough Perl. There is plenty of comprehensible Perl code out there. The only significantly obfuscated thing about my code, for example, is the occasional idiom, and the poor organization. The idioms are just about knowing the language; the poor organization is a problem in any language.

As to his other problem, he comes from the perspective that Perl has significant inherent disadvantages, and that it is a necessary evil for programs that need its advantages. He is biased from the beginning against Perl, he treats his biases as fact, and uses that to say that Perl is not the right language to use. If you could get him to see that his biases are NOT fact, but merely his opinion, that might help matters. For example, ask him what "disadvantages" Perl has. Also, ask him why he thinks other language features that you use for XML processing are NOT advantages. Try to show that those are his preferences, and perfectly reasonable ones, but that it is flatly false and inappropriate to present those as facts.

Good luck ...

Well...in all fairness

hfb on 2001-10-29T15:20:05

Until the last few opinionated sentences, I don't think it qualifies as flamebait or FUD at all. The Perl XML modules are a mess and a PITA to install and, if Jarkko's grumblings over Unicode and just how far Perl5 is from having real 'standard' Unicode support, then there may be some truth in what he says. I still really haven't figured out what XML is good for though either :)

Re:Well...in all fairness

TeeJay on 2001-10-29T16:18:13

I think he is missing that perl programmers tend to approach XML in a different manner to Java programmers.

For example there are far more options in perl for parsing, creating, transforming and handling xml than in java.

Not to mention perls ability to munge the data itself outside of validation. Why invoke all the overhead of an tree or parser object simply to output or parse in simple xml?

For example I create Dia XML using Template Toolkit. Parsing info into an object and making the objects methods available to the template parser which populates the XML. The XML is complex but it is structured and ideal for templating. Far far quicker and easier than mucking about with XSLT or parsing in and creating out.

Perl also provides allows dynamic creation of object attributes and methods - perfect for processing and handling XML structures - creating an object from XML and the methods to manipulate it.

I think he is approaching XML from a Java programmers perspective and writing off the flexibility and innovation in perl. It would be good to give some examples of using perls unique abilities to process and munge XML and he will almost certainly require help if his perl is as bad as it sounds

A.

What disadvantages?

jdavidb on 2001-10-29T19:09:29

  1. What disadvantages?
  2. What inevitable obfuscation?

That's strong enough to outweigh the disadvantages of Perl for these sorts of programs.

Stated as an axiom, but what evidence is there? The only thing people ever say is, "I couldn't even read my own code a week later." Try to assemble a bibliography to support the statement that Perl has disadvantages. Eliminating misunderstandings about JAPHs being similar to standard programs and Perl being less efficient than C (see my post under the code profiling story), everything boils down to, "I couldn't even read my own code a week later," including everything I've read from Eric Raymond.

I keep hearing claims that Perl can be written cleanly, but every time I look at Perl code that's allegedly clean its initially incomprehensible.

I regularly read code that I've written at all time periods in my career with Perl, which began in 1998. Sometimes I have some problems, but that's all I can say. Usually it doesn't take long to figure it out, if it takes any time at all. Legions of people can say, "It's unreadable," and legions of people can say, "It's fine for me," but it's all still subjective. Not "inevitable disadvantages." Note also that people who work with Perl regularly don't seem to comment that they can't read it. (You decide which is cause and effect, though.)

There's a nice article somewhere on perl.com called "Top Ten Perl Myths." You might have to dig around, but I thought it was pretty good. There was also recently an article somewhere (?) talking about why C is not necessarily much more efficient than Perl. Java and other interpreted languages definitely aren't much more efficient than Perl. (Swing. Cough, cough.) It's all about algorithms, not your language.

FUD

ziggy on 2001-10-30T11:19:13

Elliotte is an interesting guy, but I've found him to be a little too opinionated at times.

Yes, Perl has lagged a bit in terms of standardization of XML Processing modules when compared to Java. But that can be attributed to the nature of Java development vs. Perl development. Java developers tend to spend a lot of time creating one interface (e.g. SAX, DOM, JAXP) and then reimplement it a bunch of times. Technically, this is supposed to be better because the implementations are interchangeable and can focus on improving specific aspects of the implementation, such as runtime speed, memory usage, etc.

Perl developers on the other hand tend not to focus as highly on interchangeable interfaces, but develop interesting new modules that can be fixed/extended by anyone sufficiently interested.

So, while there aren't a bazillion SAX parsers for Perl, it's not necessarily an indication that Perl sucks at SAX. I tend to believe that Perl makes it easier to process Perl without needing something like SAX, so that the need for SAX isn't as great with Perl (or among Perl programmers; take your pick).

Also, anyone who has done a significant amount of XML processing over the last few years knows that complete and comprehensive explicit XML structures are not the last word in markup and structure. I've come across many circumstances where a block of PCDATA has some implicit structure that needs to be parsed out using a regex or some other type of parsing. IIRC, this was the driver for Simon St. Laurent's tree fragmentations for XML.

Now that I've mentioned Simon, wasn't he the one who recently looked at the state of Perl and XML and asked why Java techniques aren't as innovative? He was particularly interested in XML::Twig and possibly your XML::XPath (which was the first XPath parser and only XPath parser at the time IIRC).

So, taking that together along with an assertion that "explicit structures are more easily addressed by a language like Java", I'd throw the whole piece into the FUD bucket along with the overused slur about Perl being unreadable after one week.

Another way to look at it is through Gnat's glasses: Perl is used to develop real-world solutions by real people. If some technology isn't being used extensively with Perl it is probably an indication that it is not all that useful, not universal or not critical for solving problems in Perl's areas of strength. That XML is used everywhere, but not used for everything done with Perl is an indication that XML is useful but not universal. A brief look around demonstrates that XML is in fact not universal, and even within XML technologies, XML is not the complete (family of) solution(s) it purports to be.