Automatic HTML validity checking

petdance on 2003-06-12T15:36:59

I don't mean to toot my horn, but I've gotta spread the gospel that HTML::Lint,and its corresponding weblint wrapper are pretty darn useful. Every so often someone will ask me "Hey, can you look at my site, and make sure that it's OK?" The first thing I do is run weblint on it to check that the HTML is reasonably clean.

As an example, I ran it on Randal's website:

$ weblint http://www.stonehenge.com/merlyn http://www.stonehenge.com/merlyn (213:5) tag has no HEIGHT and WIDTH attributes. http://www.stonehenge.com/merlyn (290:279) tag has no HEIGHT and WIDTH attributes. http://www.stonehenge.com/merlyn (293:1) at (292:6) is never closed

Nothing very serious, since most browsers will handle the unclosed TD tag, and the IMG HEIGHT & WIDTH are just rendering helpers. Still, they're worth fixing.



Here's another example from someone still fixing up the pages for his upcoming book:

http://site/ (5:79) is not a container -- is not allowed http://site/ (208:1)

at (198:1) is never closed http://site/ (210:1)
with no opening
http://site/ (225:6) at (39:1) is never closed http://site/ (225:6)
at (40:1) is never closed http://site/ (225:6) at (35:1) is never closed

Here, the problems could get into rendering issues. Older Netscapes would just freak out on unclosed tables and refuse to draw. The pair of FORM tag mismatches are probably a nesting issue.

Finally, here's a .t file for those of you with automated test suites to make sure that all the HTML files in a project have valid HTML. This is invaluable to me during the day job, because even the WYSIWYG tools that the guys up in Marketing use don't always turn out compliant HTML. If someone puts in a bad HTML file, the hourly smokebot will notice it and fire off an email to me.

#!/usr/bin/perl -w

use strict; use Test::More; use Test::HTML::Lint; use File::Spec; use File::Find::Rule;

my $startpath = '.'; my $rule = File::Find::Rule->new; $rule->or( $rule->new->directory->name('CVS')->prune->discard, $rule->new->file->name('*.html') ); my @html = $rule->in( $startpath );

my $nfiles = scalar @html;

plan( tests => $nfiles );

for my $filename ( @html ) { open( my $fh, $filename ) or fail( "Couldn't open $filename" ), next;

local $/ = undef; my $text = <$fh>; close $fh;

html_ok( $text, $filename ); }


Shocked, shocked

vsergu on 2003-06-12T15:43:44

... even the WYSIWYG tools that the guys up in Marketing use don't always turn out compliant HTML.

And even steakhouses don't always have good vegetarian food.

Automatic Application

Dom2 on 2003-06-12T16:24:05

One of the really cunning ideas that somebody here came up with is automatic xhtml validation. In development mode, our top level autohandler (we use mason for our sites) has a filter section which passes the entire page through nsgmls. If there are any errors, it inserts them back into the page with a quick regex.

This has been a real boon for developing a correctly validating site. Otherwise, we'd have to wait for our web designer to run the page through the validator later on and then bitch at us to fix our code. Instant feedback rocks.

-Dom

Re:Automatic Application

samtregar on 2003-06-12T18:43:13

Wow, what a great idea. I'm going to hack that into my current project right now.

-sam

Re:Automatic Application

petdance on 2003-06-12T19:26:51

Actually, that's what Apache::Lint is intended to do. It kinda works but I'm having problems with the Apache::Filter chains eating HTTP response codes. On very simple stuff, though, it seems to work OK.

Re:Automatic Application

samtregar on 2003-06-12T19:37:49

I worked out a pretty slick usage with CGI::Application's new cgiapp_postrun() method. If HTML::Lint detects errors then I put some Javascript into the outgoing page to pop open a small new window with the error text nicely formatted. I also tried it as a Javascript alert but for more than a few errors that gets hard to read.

Now, let's see if the HTML dudes even want it! Even if they don't, I might keep it around to help me find their mistakes more easily.

Thanks!
-sam

Re:Automatic Application

petdance on 2003-06-12T19:48:41

Does it show context? weblint does, because it's a LOT easier to find the problems in big HTML files. Should HTML::Lint keep context as a convenience?

Re:Automatic Application

samtregar on 2003-06-12T19:53:56

No, it doesn't. That would be pretty cool too, I suppose. With the line:column numbering it wouldn't be too hard to do it externally.

-sam

Re:Automatic Application

Dom2 on 2003-06-13T07:15:36

Javascript! New window! Fab idea! This fixes one of the problems that I have with the present system in that the line numbers are off because we've inserted a bunch of extra lines in at the beginning. We worked around this by writing the source file out to /tmp so we can go in and look at it, but it's not ideal...

-Dom

Re:Automatic Application

Dom2 on 2003-06-13T07:13:32

The only reason that I mentioned nsgmls rather than HTML::Lint is that our site is meant to be xhtml compliant, and we're used to using nsgmls. It looked like HTML::Lint only did HTML4. I should probably send you a patch...

Not only that, but I've realised that XML::LibXML does DTD validation now; it should be able to do the same checking in memory rather than expensively spawning a copy of nsgmls.

Hmmm... There is much work to do today!

-Dom

Very cool!

RobertX on 2003-06-13T01:08:48

I added the following to SciTE (on XP):

command.name.0.$(file.patterns.html)=Weblint command.0.$(file.patterns.html)=weblint.bat \ $(FileNameExt)

Now I can validate my HTML on the fly!!!