HTTP Gzip Woes

gav on 2004-09-30T22:34:21

We've had a reports over the last couple months or so about getting garbled pages with Mozilla from our web app. I had a sneaking suspicion that it was something to do with mod_gzip because the page looked like you had opened a compressed file in your favorite editor. It was one of those bugs that was extra infuriating because I couldn't reproduce it and it seemed to go away every time more than one coworker was looking. Today I spotted it while looking at an unrelated problem and dug about in the server logs until I spotted this:

XX.XX.XX.XX - - [30/Sep/2004:15:37:34 -0400] "GET /bar.html HTTP/1.1" 200 5217 "http://example.com/foo.html" "Mozilla/5.0 (Windows; U; Windows NT 5.0; rv:1.7.3) Gecko/20040913 Firefox/0.10"
XX.XX.XX.XX - - [30/Sep/2004:15:37:34 -0400] "GET /bar.html HTTP/1.1" 206 31451 "http://example.com/foo.html" "Mozilla/5.0 (Windows; U; Windows NT 5.0; rv:1.7.3) Gecko/20040913 Firefox/0.10"

Consulting RFC2616 (Hypertext Transfer Protocol -- HTTP/1.1) tells me that 206 is the status code for Partial Content. But why is FireFox requesting the document twice? I had a faint recollection that I read something about this before (yay blogs!) and it turns out the Blog*Spot had similar issues.

It seems that setting the charset as a HTTP header instead of a META tag fixes the problem, I simply added an AddDefaultCharset directory into the Apache configuration and everything seems good.

Whilst looking for charset help, I found out that MySQL 4.1 will have better Unicode support and this funky regexp.