I've been working on processing RSS feeds over the last few days or so. I had problems with one feed and an invalid character which was killing the parser so I hacked this together:
PARSE_XML: eval {
$twig->parsefile($file);
$twig->purge;
};
if ($@) {
if ($@ =~ /not well-formed \(invalid token\) at line \d+, column \d+, byte (\d+)/) {
open my $fh, '+<', $file or die "Can't update '$file': $!";
seek $fh, $1, 0;
print {$fh} ' ';
close $fh;
goto PARSE_XML;
} else {
die "Error parsing '$file':\n$@";
}
}
Is there something subtle I'm missing? For my purposes this works well, especially as if the author publishes the same <item> again and fixes the problem, the error will be overwritten.
Also, see you at YAPC in a couple of weeks