Standardized XML Vocabularies considered harmful

ziggy on 2002-05-31T14:50:21

Walter Perry has penned a cautionary tale about standard xml tagsets for XML.com. He's certainly not dumb, and I think he should be forgiven if his article sounds like a whine about how only 10 people showed up at his talk at XML Europe 2002.

As far as I can tell, the gist of his argument is this: large entities (the auto industry, various patent offices, the SEC, etc.) have heretofore focused on creating "standard vocabularies". The idea is that one standardized vocabulary for a particular problem domain (auto parts, patent submissions), levels the playing field and streamlines processing by the receiving entity. After all, the point of a standardized vocabulary is to distill the knowledge of the problem domain into a document format that uses XML tags, where each tag has a specific semantic meaning and reasonable behavior expectation. For example:


 isbn:059600193
 1
 visa:1234123412341238
 ...
 ...

This snippet describes an ecommerce transaction of some kind (in this case, a book purchase). A vendor (or vendor consortium, like both Amazon and Barnes & Noble :-) would come to an agreement about how to model these kinds of transactions, and create a standardized XML vocabulary. They would then agree to accept transactions (somehow) that properly use this format, regardless of who generates this XML document or how it is generated. The idea here is that purchases made through XML documents have the same weight as purchases made through the standard web interface, or even made with paper documents.

That logic seems to make sense in the small -- when there are lots of transactions that need to be automated. It might even scale up to something as big and important as creating a standard XML representation for income tax filings. But Perry is pointing out that this logic doesn't make any sense in the large.

There's a theme here: get the experts to define a standard vocabulary, hire some programmers to process that vocabulary properly, then get rid of the experts (and the programmers, but that never really happens). Perry's expected result? Over time, standard vocabularies lead to abuse; read between the lines of a vocabulary definition (with all of the processing expectations defined) and you've got yourself what amounts to the definition of a programming environment. Hack around the edges long enough and you'll find out where you can play with how you use the vocabulary to get your patent application preferential treatment, or hide a lack of revenue in your income statement.