HTML::Tidy Revelation

tomhukins on 2005-11-04T14:09:26

In the WWW::Mechanize talk I gave at this year's YAPC::Europe and NPW, I describe how I have passed invalid HTML through the command line tool tidy before passing it to XML::LibXML to process.

I mention that I don't use HTML::Tidy because it doesn't actually clean the HTML, it just checks for warnings. At least, that's what I thought.

Robbie, who I work with, has just showed me some code where he calls clean to do this. In my defence, the documentation confused me by saying this method returns true, whereas it actually returns the cleaned content, which happens to evaluate to true. I should get into the habit of reading documentation on AnnoCPAN, which mentions this.

I hope I haven't encouraged too many people to use a separate process to do something a CPAN module already does. The module's name makes its purpose clear enough.

Whoops!


Patches welcome

essuu on 2005-11-04T14:33:33

I should get into the habit of reading documentation on AnnoCPAN, which mentions this

Surely this is a documentation bug and should have been reported via RT ?
Why should module users have to check yet another documentation source ?

Not that I have anything against AnnoCPAN in principle but this seems a good example of using it incorrectly.

Re:Patches welcome

tomhukins on 2005-11-06T19:37:46

My initial reaction was to report this with RT, but I didn't want to duplicate existing information in case CPAN authors feel overloaded.

It's a tough call, but your response prompted me to go with my gut feeling and send a patch in Bug #15573.

Thanks for the suggestion, essuu.

Re:Patches welcome

petdance on 2005-11-06T20:06:18

I'd always prefer if users err on the side of too many reports than too few. Your specific situation with a problem might be different than everyone else's anyway.