handling HTTP response codes

LTjake on 2003-07-23T12:38:38

After Mark's post yesterday on how atom aggregators should handle http response codes, I decided to bring our (currently in-house-only) RSS aggregator up to speed. I think it's safe to assume that both RSS and atom aggregators should handle HTTP in the same manner.

The structure of the aggregator is a little weird because it's a server-based "what's new?" type app and not a desktop/personal news reader.

There are two main parts to the app that read a feed.

  1. the update_cache.pl script (scheduled to run every couple of hours)
  2. the actual web app which reads only from the cache

So, i really only had to deal with the update_cache.pl script since it's the only part that actually deals with external sites.

I was able to check off a few of the requirements off right away.

  • It already handled Etags and Last-Modified dates (even though it's in-house only right now, we want it to play nice with externals servers when the time comes)
  • Referer and User-Agent were set appropriately.
  • https is supported (simply by installing Crypt::SSLeay)
  • A requirement not on the list, but important to note is that it supports the file:// protocol so all local feeds can be read straight off the file system (it also returns a Last-Modified date! LWP rocks!)

Adding gzip and deflate support was a snap -- along with sending the right header with my request

$request->header( Accept_encoding => 'gzip; deflate' );

handling the returned data was too easy:

if ( my $encoding = $response->header( 'Content-Encoding' ) ) {
   require Compress::Zlib;

   $data = Compress::Zlib::memGunzip( $data ) if $encoding =~ /gzip/i;
   $data = Compress::Zlib::uncompress( $data ) if $encoding =~ /deflate/i;
}

LWP was handling redirects automatically which was alright except that there needs to be a distinction between temporary redirects (300, 302, 307) and permanent ones (301). I had to add in a loop and use simple_request() so i could see what was returned after each request. After a permanent redirect, the URL will be permanently updated in the config file (temp redirects will not affect the config).

Speaking of redirects, code 304 Not modified is listed under is_redirect() -- which is a little misleading (but true to the specs).

I didn't make any special rules for any codes that fall under is_error(). It currently only reports that an error occurred -- I haven't decided how much of Mark's recommendation I want to follow for this app.

Other than that, I still need to look at authentication -- basic auth. is handled in the URL only, but that poses a security risk since i use URL as a key, stored in a user-cookie for customization purposes -- and proxy support.

Mark put up some tests today. Sadly there are only atom based feeds so, i can't try to parse the data, but the status codes that they return seem useful.