Yet Another RSS Feed & Adventures in Infobots

blech on 2001-09-05T22:09:04

Today #london.pm had one of its occasional hiccups when the box that two of our bots (and the domains of at least five Perl mongers) dropped off the net for most of the working day. Thankfully I'd downloaded the infobot sources the previous day to try and find out why googling didn't work any more. it turns out the fix to that is pretty straighforward: in

Extras/W3Search.pl
make the previous

my $Search = WWW::Search->new("$where");

into

my $Search = WWW::Search->new("Scraper::$where") || WWW::Search->new("$where");

which pdcawley figured out about the time I was beginning to read about this new Scraper lark. Our discussion about the problem on channel led newcomer mordred to fix the long-standing issue with currency conversions dropping back to Zimbabwean Dollars, and also the problem with it not being able to convert Euros (quite a problem during the YAPC::Europe auction, it turned out). I'll link to it once I've tested it myself.

Hence today I launched my copy of infobot as a temporary respite for people who like talking to bots not people (me included) and finally realised this was the time to put into gear one thing I'd been meaning to do for ages: a BBC News RSS feed. This was remarkably quick to knock together: take one low graphics website, LWP, some regular expressions, XML::RSS and an infobot, and hey presto, a working feed.

However, this was on my local machine, behind a firewall. Soon after I'd finished, my regular host came back online, so I excitedly uploaded the little CGI script to a more stable host which other infobots could see. Oh dear. When it got there there was no XML::RSS, no XML::Parser and no Expat. So, a bit of rather slow building later, and the script executed, but it wasn't returning any headlines. Changing the LWP::Simple interface to full LWP helped, but then the script became slow enough to fall foul of infobot timeouts (which mstevens found before I did). Increasing that from 10 to 20 seconds finally let the finished script return RSS.

Of course, once I got it working and introduced to the channel, a couple more possibilities got banded about. DrHyde wanted the tech news pulled out too. Sadly, the low-graphics tech index is nastier to parse, and the full page would probably slow the script even further. Handily, there's a feed designed for news tickers which is already used by a couple of Freshmeat projects, which includes the tech news, and as a bonus the London temperature. On the flipside, it's updated less regularly. Still, there may be a rewrite soon to use this.

And all because the Londoners like their local news.