HTTP spec problem for lwp scripts

rhesa on 2005-07-29T12:38:36

I have a script that fetches an rss feed from blogs.com. A couple of days ago it broke. Reason: server is sending gzip encoded content. I felt that was mightly impolite, but it turns out the HTTP spec is on their side:

If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html>, Section 14.3
Since LWP by default does not send any Accept-Encoding request header, this means sending gzip'ed content is perfectly acceptable. But LWP doesn't handle that, and simply gives us the undecoded content.

It would seem - to me at least - that the quickest fix to LWP would be to include a 'Accept-Encoding: identity' in the request headers by default:
If an Accept-Encoding field is present in a request, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code.
ibid. (Although I'd have to say that a 406 response would be pretty useless.)

Unfortunately, with this particular server, that does not do the trick: [user@host]% GET -H 'Accept-Encoding: identity' -U -e -d http://$blog_id.blogs.com/index.rdf GET http://$blog_id.blogs.com/index.rdf Accept-Encoding: identity User-Agent: lwp-request/2.06

Connection: close Date: Thu, 28 Jul 2005 18:46:27 GMT Accept-Ranges: bytes Age: 63019 ETag: "1cc057-75c4-427c6400" Server: Apache Content-Encoding: gzip Content-Length: 8956 Content-Type: application/rdf+xml Last-Modified: Wed, 20 Jul 2005 14:00:48 GMT Client-Date: Fri, 29 Jul 2005 12:16:46 GMT Client-Peer: 216.129.107.21:80 Client-Response-Num: 1 X-Cache: HIT from www.sixapart.com


If you don't have a plain text version, then where's the 406, guys?

Which means that it would be nice if LWP could decode the content when it sees the Content-Encoding: gzip header, and if a suitable Compress module is available.

I'd like to point out that - in my opinion - the content-encoding is a transport detail, and should not have to matter to the user. In other words, I would appreciate LWP to handle that for me transparently.

Maybe I should send in a patch....


LWP does it

clscott on 2005-07-29T14:56:46

LWP does it if you ask for the $response->decoded_content instead of $response->content. The decoded_content method was introduced in LWP-5.802.

From: http://www.issociate.de/board/post/155483/Downloading_a_page_compressed.html ( Look for message #559404 )

Re:LWP does it

rhesa on 2005-07-29T15:10:45

Awesome, thanks a bundle! I obviously have an older version of lwp. Thanks for the pointer.