dealing with errors in XML

gav on 2003-05-28T19:57:49

I've been working on processing RSS feeds over the last few days or so. I had problems with one feed and an invalid character which was killing the parser so I hacked this together:

PARSE_XML: eval {
    $twig->parsefile($file);
    $twig->purge;
};
if ($@) {
    if ($@ =~ /not well-formed \(invalid token\) at line \d+, column \d+, byte (\d+)/) {
        open my $fh, '+<', $file or die "Can't update '$file': $!";
        seek $fh, $1, 0;
        print {$fh} ' ';
        close $fh;
        goto PARSE_XML;
    } else {
        die "Error parsing '$file':\n$@";
    }
}

Is there something subtle I'm missing? For my purposes this works well, especially as if the author publishes the same <item> again and fixes the problem, the error will be overwritten.


The correct way to deal with non-XML

merlyn on 2003-05-29T18:12:12

is a big LART to the generator of said non-XML for lying to you in public.

XML Error Handling

jbisbee on 2003-06-03T16:59:10

I was thinking of adding some kind of switch to handle bad XML from rss feeds (like boingboing.net) in the XML::RSS::Feed package.

Also, see you at YAPC in a couple of weeks :)