I need to parse a DTD. Specifically I need to parse this DTD:
http://www.w3.org/TR/html4/strict.dtd
It's quite a well known one. Certainly there'll be a module on CPAN that can parse it. Let's have a look.
Looks promising, comprehensive. Unfortunately it fails with an error which is eventually tracked to a misspelled method name. So much for test coverage. Fix that and it throws a bunch of warnings that cause a rapid loss of confidence. No matter, let's try...
It's a "quick and dirty DTD parser". Hmm. "I'm too lazy to document the structure". Nice.
"Since version 1.6 this module supports my "extensions" to DTDs. If the DTD contains a comment in form"
Maybe I'll come back to XML::DTDParser...
SGML? That's got to be good, right? SGML is the daddy. Every fule no that. Unfortunately it doesn't really seem to have much of a Perl interface. It's all about translating DTDs to XML. I might be able to use that. I'm getting desperate.
I'll take a quick look at the test suite for a confidence boost. Here's one:
# Before `make install' is performed this script should be runnable with # `make test'. After `make install' it should work as `perl SGML-DTDParse.t' ######################### # change 'tests => 1' to 'tests => last_test_to_print'; use Test::More tests => 1; BEGIN { use_ok('SGML::DTDParse') }; ######################### # Insert your test code below, the Test::More module is use()ed here so read # its man page ( perldoc Test::More ) for help writing this test script.
The other test is pretty similar. I'm not that confident now.
Running out of options. Let's look at XML::ParseDTD. From the documentation it appears to rock. The test results say "2 PASS, 2 FAIL". 50/50. So at least it's got some tests, right? Damn right! Here they are in their entirety:
#!/usr/bin/env perl -w use strict; use Test::Simple tests => 2; use XML::ParseDTD; my $dtd = new XML::ParseDTD('http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'); # create an object ok( defined $dtd, 'new() returned something' ); # check that we got something ok( $dtd->isa('XML::ParseDTD'), 'it\'s the right class' ); # and it's the right class
I'm momentarily impressed that it managed to score two failures with that. I'm about to find out how.
Never mind, the DTD URI in the test looks a lot like the DTD I need to parse. I'm getting close. I can feel it.
Unfortunately it has a dependency on Cache::SharedMemoryCache (why?) which in turn depends on IPC::ShareLite - which doesn't install on my PowerBook. So now I need to fix / avoid IPC::ShareLite.
See kids: the great thing about CPAN is how much time it saves!
Re:Possibilities...
AndyArmstrong on 2007-06-01T17:45:13
I noticed them but fatigue was setting in by then:)
I'll give them a spin thanks.
It turns out that http://www.w3.org/TR/html4/strict.dtd doesn't validate - which probably means I'm missing something obvious like it isn't really a DTD or something.
Bah.Re:Possibilities...
Matts on 2007-06-01T19:41:41
It's not an XML DTD. It's SGML.