untilting the planet

Robrt on 2004-08-07T06:54:14

A few people (this means you Vahe) had noticed that planet.perl was formatted funnily. The reason - HTML truncation in some of our inbound feeds. I.e. the RSS would include the opening <ul> but not the closing </ul>. This could cause odd indentation further down the page.

A not-so-well-kept secret is that planet.perl is based on a Python tool called planet. So, while watching an episode of the PowerPuff Girls, I subclassed HTMLParser and wrote the utility to add missing closing tags, where appropriate. The code is stupid, but tiny and crystal clear.

One thing Python generally gets right -- it is trivial to subclass its core modules.

Result? No more weird indentation! At least until even odder broken HTML comes in....


Picky, picky

VSarkiss on 2004-08-07T11:43:13

Nice trick, I like it.

Except it's generating </br> , which is not a valid tag. But my browser is more forgiving than I am. ;-)

BRoken

Robrt on 2004-08-08T07:20:17

I've fixed the </br> issue. I've got a table of tags that shouldn't be balanced. I could go look at the spec, but it's more fun to guess.