I have a script that fetches an rss feed from blogs.com. A couple of days ago it broke. Reason: server is sending gzip encoded content. I felt that was mightly impolite, but it turns out the HTTP spec is on their side:
If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html>, Section 14.3
If an Accept-Encoding field is present in a request, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code.ibid. (Although I'd have to say that a 406 response would be pretty useless.)
[user@host]% GET -H 'Accept-Encoding: identity' -U -e -d http://$blog_id.blogs.com/index.rdf
GET http://$blog_id.blogs.com/index.rdf
Accept-Encoding: identity
User-Agent: lwp-request/2.06
Connection: close
Date: Thu, 28 Jul 2005 18:46:27 GMT
Accept-Ranges: bytes
Age: 63019
ETag: "1cc057-75c4-427c6400"
Server: Apache
Content-Encoding: gzip
Content-Length: 8956
Content-Type: application/rdf+xml
Last-Modified: Wed, 20 Jul 2005 14:00:48 GMT
Client-Date: Fri, 29 Jul 2005 12:16:46 GMT
Client-Peer: 216.129.107.21:80
Client-Response-Num: 1
X-Cache: HIT from www.sixapart.com
LWP does it if you ask for the $response->decoded_content instead of $response->content. The decoded_content method was introduced in LWP-5.802.
From: http://www.issociate.de/board/post/155483/Downloading_a_page_compressed.html ( Look for message #559404 )Re:LWP does it
rhesa on 2005-07-29T15:10:45
Awesome, thanks a bundle! I obviously have an older version of lwp. Thanks for the pointer.