XML Pipeline

ziggy on 2002-03-04T19:49:24

I finished reading the XML Pipeline note on the train this morning. Quite well thought out for an early draft. Reading Sean McGrath's XML Pipe and playing with XML::SAX::Machines, it wasn't quite what I expected.

To be perfectly blunt, XML Pipeline is nothing more than a XMLified Makefile syntax designed for building XMLish streams. That doesn't invalidate the good design and good thinking that went into this note. However, it's not exactly what I thought I was going to be reading. What I had oped for was something a little more low level -- an XML description for a single SAX machine for example that could scale to connecting processes that didn't necessarily communicate through SAX events. Or perhaps just a generic Pipeline definition for something completely askew to XML processing (Hi, Piers!).

One thing that I did find disconcerting was the "implementation defined" punt that defers the definition of an processing step. The example described in the abstract was a Makefile replacement, in that files were created and programs were called on the command line to build them. Yet the note focuses on abstract connections that can be made between process (e.g. org.xml.XSLT, org.xml.XmlSchema, etc.) and doesn't say how those connections are to be made.

That got me thinking that what we need here is another optional attribute to describe the API convention used by a process. For instance, Java processes could adhere to the JXfoobar API (specified through the appropriate namespace name), while Perl or a C API might be conceptually similar, but fundementally incompatible (thanks to impedence mismatches). A suitably challenged Perl hacker (or a suitably crazy one like Ingy) might even be able to write the bits of an XML Pipeline processor that understood multiple language linkage conventions -- passing opaque pointers between XML::LibXML and a C program using libxml or even adding a transparent impedence mismatch component through a Java native interface to plug into a Java transformation process...

That one additional attribute would buy a lot. A pipeline processor could summarily ignore all processes conforming to foreign API conventions (e.g. Java processors ignoring Perl/Python processes). For simplicity though, it might be necessary to define a default behavior - drop files on the file system and call CLI apps to do the processing...

Of course, this is still in the early days of development. I don't see anything that makes it simple to declare "this process is a synchronous/asynchronous SOAP event", "this is a process defined by that WSDL description over there" or the like. If an XML Pipeline processor is going to do as much as it sounds like, it doesn't sound too difficult to plug in an optional SOAP/XML-RPC/whatever peer-to-peer linkage into the whole mess.

One thing is for sure though. If this is nothing more than an abstraction over a Makefile, then I'm not particularly interested. If it becomes a declarative language to define processing behaviors, then that's a different story all together.