I had an acceptance tester walk up to me and ask why there was no revision information attached to a particular element in our REST interface. It was a bug. To fix that bug required one line of code. I added it. The tests failed.
Turns out that it requires two lines of code. I added the other line. The tests failed.
I updated the requisite RELAX NG schema. The tests failed.
I've started updating the hard-coded XML files in some acceptance tests. My brain failed.
Wouldn't it be nice if all you had to do was update a schema? Imagine reading a RELAX NG schema and generating the appropriate Perl code to create the data structures which then gets run through an XML generator to produce XML which automatically validates against the RELAX NG? You could even autogenerate tests with this. Then, you just update your RELAX NG and that's it!
Doesn't quite work. RELAX NG is all about the structure of a document, not its meaning. If you see a last-modified attribute three years in the future, you know it's probably wrong, but RELAX NG doesn't. So I experimented with adding annotations to RELAX NG, but it was making my parser more complicated. RELAX NG doesn't do everything I need.
A further problem is that RELAX NG does more than I need:
element addressBook { element card { attribute name { text }, attribute email { text }+ }* }
See that plus sign after the email attribute? It means "one or more". While duplicate attribute names are valid in XML, many XML parsers get confused and it can also be ambiguous since attributes are inherently unordered. You can also write your RELAX NG with a grammar and specify the start element, but that doesn't fit my needs in this case. As a result, a programmer can write valid RELAX NG but it wouldn't work in my system, so that's an expectation violation. It also further complicated my parser and was making my hard work much harder. RELAX NG simultaneously does too little and too much. It's not a bad fit, but it's not a great one, either. (A pure Perl RELAX NG parser would help tremendously, but that would delay this even further).
I've started writing custom YAML files which do exactly what I need. The system is named "Bermuda" and the YAML files, named "islands", have a .bmd extension (GameCube also uses a .bmd extension). Code which generates code is very difficult to write, but the payoffs are huge. I seriously doubt this will ever see the light of day, but it's a fun project. Here's a sample YAML file:
--- package: My::Card island: card attributes: href: type: anyURI method: url revision?: if: 'defined $card->revision' type: positiveInteger elements: - name - type: string - email* - type: string - phone+ - method: phone_numbers type: string
And here's the generated code (yes, I actually have this much working):
package My::Card::Bermuda; use strict; use warnings; use Carp 'croak'; sub new { my ( $class, $instance ) = @_; return bless { data => { island => 'card', attributes => {}, element => [], }, instance => $instance, } => $class; } sub instance { shift->{instance} } sub name { 'card' } sub build { my ( $self ) = @_; $self->_add_attributes; $self->_add_elements; return $self; } sub _add_attributes { my ($self) = @_; my $card = $self->instance; if (defined $card->revision) { $self->{data}{attributes}{revision} = $card->revision; } $self->{data}{attributes}{href} = $card->url; return $self; } sub _add_elements { my ($self) = @_; my $card = $self->instance; my $count; push @{ $self->{data}{elements} } => { name => 'name', attributes => {}, element => [ $card->name ], }; $count = 0; foreach my $email ( $card->email ) { $count++; push @{ $self->{data}{elements} } => { name => 'email', attributes => {}, element => $email, }; } $count = 0; foreach my $phone ( $card->phone_numbers ) { $count++; push @{ $self->{data}{elements} } => { name => 'phone', attributes => {}, element => $phone, }; } unless ($count) { croak("Method 'phone_numbers' failed to return at least one element"); } return $self; } 1;
I'm not writing out the RELAX NG yet, but it would look like something like this:
element card { attribute revision { xsd:positiveInteger }?, attribute url { xsd:anyURI }, element name { xsd:string }, element email { xsd:string }, element phone { xsd:string }* }
That's a nice syntax and I really wish I could use it, but it just doesn't quite fit :(
Have you tried to combine Schematron with RELAX NG? It’s quite easily possible, as RNG has extension points that allow for such an undertaking, and Schematron gives you rule- as opposed to grammar-based validation. In short, Schematron rules are arbitrary XPath expressions that must match/be true in the contexts you specify for them. Particularly with suitable XPath extension functions, that lets you validate pretty much any kind of constraint whatsoever.
(You can also use Schematron standalone, but I find that defining full-blown grammars in terms of rules is much less readable. Declarative wins big in the areas where it can carry you. But Schematron is very nice not just for complementary purposes, but also because it’s descriptive rather than prescriptive, letting you validate aspects of a document as coarsely or finely grained as you need. Validity isn’t a binary concept.)
Another way to supercharge RNG is to use a different data type library – you aren’t limited to XSD (which is a pretty misshapen type system).
As for parsing the RELAX NG Compact syntax, why would you try? There is a corresponding XML vocabulary as well as tools such as Trang (yeah, Java, enh) which can translate one into the other. So you can turn an RNG grammar into a DOM that’s close enough to an AST for basically free.
RELAX NG and Schematron are both pretty cool stuff.
Re:Schematron?
Ovid on 2008-01-20T17:40:02
Actually, I have trang installed and used that to convert the compact grammar to XML. I had stuff like this:
element card {
## if: defined $card->revision
attribute revision { xsd:positiveInteger }?,
element name { xsd:string },
element email { xsd:string },
## method: phone_numbers
element phone { xsd:string }*
}And it was getting converted to this:
<?xml version="1.0" encoding="UTF-8"?>
<element name="card" xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<optional>
<attribute name="revision">
<a:documentation>if: defined $card->revision</a:documentation>
<data type="positiveInteger"/>
</attribute>
</optional>
<element name="name">
<data type="string"/>
</element>
<element name="email">
<data type="string"/>
</element>
<zeroOrMore>
<element name="phone">
<a:documentation>method: phone_numbers</a:documentation>
<data type="string"/>
</element>
</zeroOrMore>
</element>And I was using XML::LibXML to break that down into a data structure, but it was getting rather ugly. With YAML, I get the data structure for free. I have to say, though, that I didn't look at Schematron. I've never used it before.
Currently I have almost all of the basics in place for what I'm doing, so I'm not likely to switch now. However, I've clearly separated out my parser from everything else. As a result, if I ever need to switch to another parsing system, it should be trivial once the other parser is written. (Heh. Famous last words.)