Relearning Web Lessons Via URL Hacking

Ovid on 2009-08-17T22:54:44

In writing the code for a pet Web site project, I decided to add 'spider' tests. These test two things:

  • All local GET requests for links on the site result in 200 response codes
  • URL hacking is supported

In today's world, you have to expect that URL hacking is going to occur, so I simply assumed that if someone sees /country/starts_with/o, they're going to try /country/starts_with/. So I wrote tests to ensure that any URL hacks like that return a valid page. A failing test ensured that I saw my stupidity. The following two URLs were valid:

  • /country/afghanistan/
  • /country/starts_with/a/

The problem is that URL hacking implies the following are valid:

  • /country/afghanistan/
  • /country/starts_with/

Those are structurally identical but they're not semantically identical and the structure implies semantic equivalence. This is bad. We now have this:

  • /country/afghanistan/
  • /country/?starts_with=a

That's much better.

Also, we have the following:

  • Search now works but only searches on country name (accents are stripped)
  • A custom 404 page
  • Google map zoom level now depends on the area of the country
  • Vaguely better tests

I highly recommend that every who cares about Web development start a personal project from scratch with the technologies they use. This will teach you tons of things you may have forgotten. If you have sane tests, you'll learn even more.

Also, I really need better database migration code.

Oh, and I already have two feature requests. Sigh :)

And don't look for Palestine. Just don't. I was surprised, but it appears to be Google behavior, not mine. And I just realized why I can't get the map for "Côte d'Ivoire". :)


URI design: for whom?

Aristotle on 2009-08-18T03:55:44

Those are structurally identical but they’re not semantically identical and the structure implies semantic equivalence.

No, it doesn’t. It’s OK for humans to hack URIs, but programs should only ever follow links and leave the link structure of the site to the server’s whims. As far as programmatic user agents are concerned, URIs are completely opaque. This is the centrepiece of REST.

(I agree with your URI structure change anyway, though, for personal URI style reasons.)

Re:URI design: for whom?

Ovid on 2009-08-18T06:14:22

We'll have to disagree on this one. For one thing, if I ever add a REST interface, I think this is an important distinction for folks. Generally, REST is something which you have to program; you can't just point a client at your site and hope it works. Thus, you want to make things as easy for the programmer to understand as possible. Second, it's important in general. People love to just slap crap into their URIs without considering structure. That's one of the many reasons why we get so many awful Web sites.

Re:URI design: for whom?

Aristotle on 2010-01-07T08:13:07

We cannot disagree, because the points you are talking about have nothing whatsoever to do with Representational State Transfer in the first place.

REST is something which you have to program; you can’t just point a client at your site and hope it works.

Sure I can. It depends on what level of expectations you have. A web proxy, f.ex., works regardless of whether it’s proxying requests to Google, Amazon or your private homepage. That works because it’s built to the web’s architecture, ie. REST, which is all about making state transitions explicit instead of sticking them inside a blackbox (as RPC does).

And Google’s crawler certainly manages to work just fine no matter which site it’s pointed at, because it understands what <a> means and all sites have agreed to use <a> the same way. This works because al the sites use HTML and URIs instead of binary wire formats and application-private identifiers.

People love to just slap crap into their URIs without considering structure.

Yeah sure. I never said URI design wasn’t important. I said it was important for humans only, not for programs. Google’s crawler would never work if it had to know how to construct URIs for each individual website. It works because it doesn’t care about URIs, only about where to find them in HTML.

Info for teh (sic) geeks

bart on 2009-08-18T13:07:01

Funny that you mention that you use Vim for editing the source files, but you don't mention what database you use — if any. As you said:

Also, I really need better database migration code.

it would help if you say what databases you are migrating between.

Neither do you say what Perl version you use. Likely that's either 5.8.x or 5.10.x. 5.10 is nice for new features, but I have had some experience with a few nasty Unicode bugs in it, that didn't happen in 5.8.x.

Re:Info for teh (sic) geeks

Ovid on 2009-08-18T13:26:12

My Mac is using 5.8.8 and the public facing version is using 5.10. The database is currently SQLite, but this is likely to change in the future (regrettably, MySQL is so frickin' easy to install and use. I prefer PostrgreSQL after it's up and running, but I've found that getting there has been daunting for me. I'm a lousy admin).

Hence, there are a couple of tech issues I've not been explicit about.

Palestine

jdavidb on 2009-08-18T20:28:53

Google Maps find Palestine, TX, for me, which is actually in my neck of the woods. What's it find for you? :)

Re:Palestine

Ovid on 2009-08-18T20:32:07

Palestine, Texas. Unless every Google I've run across mysteriously knows I was born near there, I think this is default behavior.

Re:Palestine

jdavidb on 2009-08-19T22:07:29

Sounds sensible to me, although I can see why others would want Palestine the middle-eastern region. I presume Google biases towards searching in America, being an American company with presumably a majority American audience.