Yeah, yeah, I know... there is no RSS that isn't broken, one way or another, but not everybody has boarded the Atom bus yet.
So I'm using XML::RSS to parse a variety of often-broken feeds (Simple Machines Forum, just as a for-instance, appears to encode its entries, *then* truncate them, resulting dangling tags or even in "..." happening right in the middle of multibyte characters, tags, whatever). And it's dieing fairly often.
No problem, I'll just wrap it in an eval, and skip that feed until its issues get resolved (usually by the corrupted entry expiring, hopefully before another corrupted one jumps on board).
And so I do. And it dies inside the eval. Wait, what?
I mean, I do seldom run across modules that are ill-behaved enough to up and die instead of throwing something more catchable. But I thought eval was a magic fix for that.
Googling a little, the answer on perlmonks and elsewhere seems to be "Well, of course it should die irrevocably. You shouldn't be using invalid XML anyway." Fine, fine, that's the first thing I'll outlaw when I'm made Empress Of All Intarwebz. In the meantime, back in the real world, I'd *like* to be able to recover and go on to the XMLs that ARE valid (so far as you can say "valid" about RSS), thankyouverymuch.
eval's never failed me before, though. I
might actually have to learn how it works so I can figure out what I'm doing wrong.
But seriously. How can you screw up
eval( $feed->parse($page));
?
Apparently, if you're me, "pretty easily."
Perhaps eval ();
should be eval { };
?
use XML::RSS;
my $f = XML::RSS->new;
eval { $f->parse( 'x' ); };
print 'okay';
Re:Uh, wrong brackets?
rjbs on 2007-09-20T17:54:53
Unless you mis-transcribed, LTJake is right. What you're doing is "eval EXPR" which evaluates the expression normally, then assumes it's a string containing Perl code and evaluates that. What you want is "eval BLOCK" which runs the block of code, catching exceptions.Re:Uh, wrong brackets?
wirebird on 2007-09-20T20:02:46
Yep, that's exactly what it is. The only mis-transcription was that I'd formatted it all pretty in my code just like it was a block, and it never once registered that hey, those are awfully rounded curly-braces. Not even when I left it all on one line in my rant. (And I guess that parse() must return a 1 or something equally harmless on success.)
Bah, decongestants. This is why I'm not working on production code today. And clearly should not even be let near even the quick hack stuff, either.
Thanks, guys.
(The rest of the rant stands, though. SMF is a clear example of why rewriting just to change languages is generally not a good idea.)
Re:Uh, wrong brackets?
wirebird on 2007-09-20T20:17:43
You know, you'd think that when it choked on a semi-colon I put in there (when I added a hello-world just to make sure it was hitting that code on *working* feeds), it would have maybe sent up a red flag for me. But nooooo, I sez to myself, I sez, "Gee, I can't even type a working print statement today, I'm just making this worse" and reverted it all instead of thinking about it.
Yyyeah. No more coding for me today.
Re:XML::Liberal
wirebird on 2007-09-20T22:28:49
XML::Feed is just an API-unification wrapper around XML::RSS and XML::Atom::Feed, innit? I'm using XML::Atom::Syndication::Feed instead, for reasons I can't quite remember just now, but, uh, I'm sure they were good ones at the time.
XML::Liberal might help, though I assume it has the same issue XML::Twig (which I saw someone recommend as an alternative) has in that it's not RSS-specific, and this was supposed to be a quick and dirty hack to pull title, date, content, and link out of feeds... and I didn't want to have to hunt down all the potential variations of where the content (especially) can be hidden. (XML::Atom::Syndication::Feed already doesn't count summaries as content, which isn't entirely unreasonable, but which does mean one more bit of exception coding in the shoulda-beena-quick-hack, which I haven't gotten around to actually coding, and undoubtedly shouldn't today.) I kind of don't want to throw out entire feeds, but on the other hand I don't want to actually do any *work* to fix them...