Hot: Semicolons. Not: Ampersands.

TorgoX on 2005-12-02T05:45:52

Dear Log,

«B.2.2: Ampersands in URI attribute values

The URI that is constructed when a form is submitted may be used as an anchor-style link (e.g., the href attribute for the element). Unfortunately, the use of the "&" character to separate form fields interacts with its use in SGML attribute values to delimit character entity references. For example, to use the URI "http://host/?x=1&y=2" as a linking URI, it must be written <A href="http://host/?x=1&#38;y=2"> or <A href="http://host/?x=1&amp;y=2">.

We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.»

It's not a new recommendation either. It's been there since 1995. CGI.pm supports it, but I doubt I've ever seen it actually used anywhere.


Support

Dom2 on 2005-12-02T07:54:12

Part of the trouble is the lack of support. I always use URI.pm to generate URLs with query strings correctly, but it insists on producing ampersands. I know, I should submit a patch.

The other part of the problem is the fact that our tools are fundamentally broken and insecure.

-Dom

Re:Support

dws on 2005-12-02T08:44:02

There's already a patch for URI, but the bug its attached was submitted prior to the W3C XHTML recommendation, and has a low priority. I submitted a meta bug. Additional lobbying might help, though.

Re:Support

dlc on 2005-12-02T16:16:32

I always use URI.pm to generate URLs with query strings correctly, but it insists on producing ampersands. I know, I should submit a patch.

I don't think that's a URI.pm bug, since escaping the query string is only relevant when it's being used as an attribute value. I'd say it's a bug in whatever is generating the HTML code.

(Unless you mean the patch changes it to emit semi-colons instead of ampersands, in which case, I apologize, because you're correct.)

Re:Support

Dom2 on 2005-12-02T16:22:40

Yes, that's exactly what I was referring to. I reckon that there should be a global variable in URI::_query or something to set which one you would prefer. I haven't looked at the relevant bug yet to see if that's how it's done.

-Dom

Re:

Aristotle on 2005-12-03T06:23:51

Funny, I complained about that just recently – PHP still isn’t configured to accept semicolonised query strings by default. It’s by far the biggest offender.

Maybe I should take a look at what the Rails and Django folk are doing and complain at them too.

Perl stuff is mostly good about this.

OAI

pudge on 2005-12-07T05:03:01

I wrote some OAI (Open Archives Initiative) code for Slashdot, for a certain search site to use to get data from us. OAI is XML, and part of the spec is to pass a "resumption token" for when the results list is incomplete (as tends to happen, since you don't want to put all the Slashdot data in one result list).

So for simplicity, I used the query string made to make the request as the resumption token, with the counter incremented. This meant encoding & as &, of course. The search site people thought the bug was mine, but it was in their code. I convinced them of this, thankfully, because that was easier than changing my code. But I shouldn't have used & in the resumption token (I have to use it in the query string, as per the OAI spec, but the spec allows the token to be anything).