XML Schema, yet again

ziggy on 2002-01-14T15:10:29

I was teaching the XML Schema course in New York last week. It's a very condensed presentation of the XML Schema language and really deserves about 2 days of coverage. Unfortunately, I had to cram that and a 3-day XPath/XSLT into three days. The presentation was a little rushed (about 450 slides in 3 days, when 90 slides/day is a good pace).

Like most XML programmers, I'm not too fond of XML Schema. Ideally, I could be teaching RELAX NG instead, but realistically it's the big companies that want XML Schema because it's blessed, but the bleeding-edge programmers that want RELAX NG. This mess will probably sort itself out in about 18 months time. Regardless of the RELAX NG / XML Schema split, XML Schema will probably be around for a long time because (1) it's blessed by the W3C, and (2) because it's the only schema language at the moment that provides any validation of element/attribute content.

As usual, I found that as I present the material on structures (<xsd:complexType/>, <xsd:simpleType/>, <xsd:complexContent/> and <xsd:simpleContent/>), I find myself tyring to hold back the bile and not become an apologist for the spec. Then I get to the point where enough material is covered to produce schemas that validate meaningful content and structure -- like a block that tags addresses that can validate US States (either abbreviations or spelled out), zip codes and optional zip + 4. The real beauty here is that with a few tweaks to the structure, that is composable into a larger container that can validate US addresses or Canadian addresses through the schema definition. (Extending this to the entire Commonwealth is left as an exercise for the reader...)

So, as much as I dislike XML Schema, each time I get to this slide, I find that I really like what it can/should do (even if the syntax is overcomplicated and could benefit from a couple of refactorings). The best part is that this validation can be done declaratively and automatically -- regardless of the programming language used.