I've been working on processing RSS feeds over the last few days or so. I had problems with one feed and an invalid character which was killing the parser so I hacked this together:
PARSE_XML: eval { $twig->parsefile($file); $twig->purge; }; if ($@) { if ($@ =~ /not well-formed \(invalid token\) at line \d+, column \d+, byte (\d+)/) { open my $fh, '+<', $file or die "Can't update '$file': $!"; seek $fh, $1, 0; print {$fh} ' '; close $fh; goto PARSE_XML; } else { die "Error parsing '$file':\n$@"; } }
Is there something subtle I'm missing? For my purposes this works well, especially as if the author publishes the same <item> again and fixes the problem, the error will be overwritten.
Also, see you at YAPC in a couple of weeks