Maketext fun

TorgoX on 2004-01-12T06:10:49

Dear Log,

So for months I've been dimly aware that something needed to be done in Locale::Maketext with its logic that decided that users requesting "en-US" would also accept "en" documents. I'd been fretting over implementational ideas for weeks and weeks, and I finally sat down Saturday morning to hammer out the solution in "a few minutes". But then I saw a corner case that was very problematic for my new algorithm, and I threw out that algorithm and started over. That too ended up producing a bad corner case, so that went out. Repeat two more times. But I finally released a happy new version today (Locale::Maketext 1.07), with new tests of course. And it's my impression that the code is tidier now, too.

In this case, I'm quite liking the approach that having all the tests provides; namely, an approach that isn't as much about "formal" specification as about examples, in the form of simple tests.

All Maketext users should upgrade now. What the new version fixes is some past behavior that I'm now calling a bug -- it was where users were getting not-quite-best language choices. So they'd say they'd want "en-US, ja" documents, and they'd get the Japanese document where an English one was available. With the upgrade, users now correctly get the English version.

Language negotiation

pne on 2004-01-12T12:35:58

So they'd say they'd want "en-US, ja" documents, and they'd get the Japanese document where an English one was available. With the upgrade, users now correctly get the English version.

Correctly? I thought that the opposite was the case (if I list "en", I'll be happy with "en-GB" or "en-US" as well), but that if I only list "en-US, ja" then that means "US English, or any kind of Japanese", but not plain "en" nor any other "en-*".

*digs around a little* That's what I gather from my reading of section 14.4 of RFC 2616 as well: A language-range matches a language-tag if it exactly equals the tag, or if it exactly equals a prefix of the tag such that the first tag character following the prefix is "-".

By my understanding, "language-range" is what's sent in the Accept-Language: header of an HTTP request, and "language-tag" is the tag associated with a resource.

Then a language-range of "en" would match a tag of "en-US", but one of "en-US" would not match a tag of "en" (since "en-US" is neither equal to "en" nor is "en-US" a prefix of "en").

RFC2616 even warns about this: we remind implementors of the fact that users are not familiar with the details of language matching as described above, and should provide appropriate guidance. As an example, users might assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. A user agent might suggest in such a case to add "en" to get the best matching behavior.

Re:Language negotiation

TorgoX on 2004-01-12T20:57:12
The RFC expresses what it wants the language tags to mean. However, experience shows that users mean other things by it. They assume that by selecting "en-gb", they will be served any kind of English document if British English is not available -- and user-agents do not provide any guidance otherwise. What's more, many/most user-agents start out with a very specific accept-language (like "en-US", sans "en") based on the user's locale, and never even mention this to the user.
Given these nouveaux exigences, I consider my implementation to be optimal.

Re:Language negotiation

pne on 2004-01-13T12:40:49
Ah, I see.

I presume this only happens if the user does not specify the exact tag as well?

For example, if someone says they want "en-gb, fr, en" and you have a French and a US English version, which will they get? (I'd assume the French, since "en" comes after "fr" even if "en-gb" is earlier.) What about if you have a French and a "generic" English version?