Yesterday I was playing with xmltv feeds. I knocked up a little app that connects to a remote server, grabs a file (in this case xml), stores it in a cache, passes it into a templating engine (XSLT), and spits the result out as a web page.
The code is based on an RSS display tool I previously wrote. It's the same principle, grab something, transform it, and display it. In both cases I don't want to grab the data every time, I'm happy to use a local copy if it's only a few hours old.
I realised that a lot of the code could be abstracted out into a module. The transformation element may be to specialised, but we shall see.
I'm not trying to re-invent Squid, I just want a simple URI getting tool, that can cache data. Basically you can call the app many times, but not check source data every time.
Does this exist on CPAN already? is so where? If it doesn't exist already, what should I call it?
Re:LWP
ajt on 2004-02-24T15:57:07
Interesting idea. I knew LWP did some caching, but I thought it was in memory caching, for within a single process, not file caching. This is probably enough, certainly for me anyway.
However LWP still has to do a HTTP request, to determine if the page has changed, and some feed providers may still dislike the frequency of the requests, even if they are only "304, Not modified since last retrieval".
My other option is to use cron and Wget to obtain the data, and try not do my data gathering at page viewing time. As with LWP, Wget can do a get if-modified, which should be enough to not upset the feed provider. By setting the cron infrequently enough I can control the load on the data feed, independently of the page request.
For the kind of data I'm pulling it's okay if the data is a little stale, so I want to make it avoid going to the source for a while, if I can. Given that I'm not planning to redistribute the feeds, it's for a home intranet use only, it's probably a lot of effort for nothing....
Re:LWP
Matts on 2004-02-24T16:44:25
I think if you pull the feed as you're viewing it you're likely to put LESS strain on the remote end than they normally see from a script running 48 times a day. Unless you hit refresh more than 48 times.
Honestly, anyone getting antsy about an individual user refreshing their RSS feed a few times in a few minutes has way too much time on their hands, and doesn't understand the web.Re:LWP
Dom2 on 2004-02-24T19:00:18
You can probably subclass mirror() and make it check for an explicit expiry date (assuming the HTTP headers are stored somewhere). If you're not past the expiry date, you should be able to use the file without revalidation.-Dom
Re:LWP
ajt on 2004-02-24T23:02:42
Good idea, I hadn't thought of that.