Insanity

darobin on 2002-08-16T10:59:06

As was made clear before, it has long been my opinion that XML Schema sucks.

I am now almost done putting together a complete schema for SVG Tiny (I already had all the types last week and then moved on to actually mapping them to something) plus large bits for SVG Basic and SVG, as well as other minor schemata for things such as XLink.

So now I hate them even more because I have a much sharper understanding of why they suck than I did before I had to make them go places no one thus far had dared to push them ;) On the other hand, I seem to be strangely enjoying having to fight with this painstackingly undocumented spec. So yeah, I guess those people that were mocking me for the fact that I enjoyed using the DOM may freely snicker now as I clearly am insane (but it's strongly typed insanity, ha!).

* darobin tries to remember who it was that once commented on a strong BDSM trend in the Perl community...


XML Schema alternatives

robin on 2002-08-16T14:58:07

As was made clear before, it has long been my opinion that XML Schema sucks.


What do you think of the alternatives to schema? I'm particularly thinking of RELAX-NG here.

Re:XML Schema alternatives

darobin on 2002-08-16T15:21:37

RelaxNG is imho much better for several reasons: 1) it relies on a model (hedge automata) that while it may appear as complex to some is imho sane; 2) it is documented properly (ie you can understand the spec by reading it even if you are not interested in implementing it); 3) it splits typing out of the picture; 4) it's readable to humans and you usually know what you're doing and which rules you're breaking when writing RNG schemata (that's hardly the case with WXS); 5) RNG doesn't do weird things with namespaces. I'm probably forgetting a few things there but that should cover a large part of my opinion. RNG also has a non-XML syntax which is a lot less verbose and much easier to read.

On the downside it is iirc currently possible to produce non-deterministic automata (or at least non-deterministic content models) with RNG, which is a problem for a number of applications (including, as it is, our own) but last I heard they were working on that.

We -- and I'm sure others -- would be happy to move to RNG but it needs further adoptance for that (we're already pushing a number of newish things, there's only so much innovation we can bundle into something and not freak customers out). It moving inside ISO spheres might help as a good deal of what we do is MPEG-related (and MPEG is ISO) but unfortunately MPEG-7 already uses WXS (under the name of DDL, which is a slightly extended version). It's certainly something I'll be looking at on a regular basis.

As for other options -- DTDs and Schematron -- I will not mention DTDs because they can't deal meaningfully with namespaces at this time, and namespaceless XML is imho not XML (at best a weird artifact of the past). Schematron is imho an excellent solution in that it is probably the simplest and most expressive. Its downside however is that you need to have the full document in memory at any given time in order for it to work, and for many applications that's a non-starter.

Don't hesitate to drop me a mail if you want to discuss this in a friendlier environment than textareas ;)

Re:XML Schema alternatives

ziggy on 2002-08-16T19:44:43

3) it splits typing out of the picture
This is both a benefit and a drawback to RNG. I like how RNG solves one problem well, leaving a best-of-breed solution to develop for the nasty problem of datatyping.

Unfortunately, the only reasonable solution for datatyping content is WXS Part 2, which is highly arbitrary in its selection of basic types. Is that sufficient for your needs (are you even doing datatyping of content?), or do you find a need for something better than WXS Part 2? (Assuming you could also get rid of the rest of WXS?)

Re:XML Schema alternatives

darobin on 2002-08-17T00:04:20

I'm not sure that WXSp2 is the only reasonable solution for datatyping :) I think you mean that it is the only one that is presently sufficiently spec'd/adpoted/understood/something-like-that (for various meanings of "sufficiently"), no?

We do use datatyping, in fact we use it quite extensively. MPEG-7 DDL extends WXSp2 with a few types that it needs (such as multidimensional arrays) which shows quite clearly imho that WXSp2 would have been better written as a framework for type creation than as an assorted list of mostly unrelated types, with gaping holes. But it is *far* from being sufficient for efficient compression (or even validation -- try to express SVG path data properly in WXS... it's useless) in many cases.

The reason we use it extensively is two-fold: either it works (eg in the case of enumerations, those allow one to compress very easily), or it doesn't and we still rely on "stupid" types because they allow us to plug in specific codecs. Typical examples of that include lists of tokens separated by commas instead of whitespace or things as simple as a number followed by a unit (eg: 12px). Patterns are so useless in this context that they were turned down by a large standard body even though it was shown that in some cases there could be a win (they are, after all, automata and thus open the door to compression, but the fact that they lack atomic typing makes them almost unusable in the general case).

I am working on extensions that will allow us to provide typing well beyond what WXSp2 allows but I'd rather not try to explain that just yet as it's still an unordered heap of fuzzy ideas that have theoretical proofs but no concrete syntax. I think the next big thing to do after RNG is a generic typing language/framework (at the atomic, not structural, level) for use in structural schema languages. I have ideas there, I hope to have the time alloted to develop them (unless my spare time suffices, which I doubt).