Discoverability in REST vs Providing an API

autarch on 2008-11-01T21:23:44

I'm still stuck on the whole problem of the requirement that URIs for REST APIs be discoverable, not documented. It's not so much that making them discoverable is hard, it's that making them discoverable makes them useless for some (common) purposes.

When I last wrote about REST, I got taken to task and even called a traitor (ok, I didn't take that very seriously ;) Aristotle Pagaltzis (and Matt Trout via IRC) told me to take a look at AtomPub.

I took a look, and it totally makes sense. It defines a bunch of document types, which along with the original Atom Syndication Format, would let you easily write a non-browser based client for publishing to and reading from an Atom(Pub)-capable site. That's cool, but this is for a very specific type of client. By specific I mean that the publishing tool is going to be interactive. The user navigates the Atom workspaces, in the client finds the collection they're looking for, POSTs to it, and they have a new document on the site.

But what about a non-interactive client? I just don't see how REST could work for this.

Let me provide a very specific example. I have this site VegGuide.org. It's a database of veg-friendly restaurant, grocers, etc., organized in a tree of regions. At the root of the tree, we have "The World". The leafs of that node are things like "North America", "Europe", etc. In turn "North America" contains "Canada", "Mexico" and "USA". This continues until you find nodes which only contain entries, not other regions, like "Chicago" and "Manhattan".

(There are also other ways to navigate this space, but none of them would be helpful for the problem I'm about to outline.)

I'd like for VegGuide to have a proper REST API, and in fact its existing URIs are all designed to work both for browsers and for clients which can do "proper" REST (and don't need HTML, just "raw" data in some other form). I haven't actually gotten around to making the site produce non-HTML output yet, but I could, just by looking at the Accept header a client sends.

Let's say that Jane Random wants to get all the entries for Chicago, maybe process them a bit, and then republish them on her site. At a high level, what Jane wants is to have a cron job fetch the entries for Chicago each night and then generate some HTML pages for her site based on that data.

How could she do this with a proper REST API? Remember, Jane is not allowed to know that http://www.vegguide.org/region/93 is Chicago's URI. Instead, her client must go to the site root and somehow "discover" Chicago!

The site root will return a JSON document something like this:


 { regions:
   [ { name: "North America",
       uri:  "http://www.vegguide.org/region/1" },
     { name: "South America",
       uri:  "http://www.vegguide.org/region/28" } }
   ]
 }

Then her client can go to the URI for North America, which will return a similar JSON document:


 { regions:
   [ { name: "Canada",
       uri:  "http://www.vegguide.org/region/19" },
     { name: "USA",
       uri:  "http://www.vegguide.org/region/2" } }
   ]
 }

Her client can pick USA and so on until it finally gets to http://www.vegguide.org/region/93, which returns:


 { entries:
   [ { name: "Soul Vegetarian East",
       uri:  "http://www.vegguide.org/entry/46",
       rating: 4.3 },
     { name: "Chicago Diner",
     uri:  "http://www.vegguide.org/entry/56",
     rating: 3.9 },
   ]
 }

Now the client has the data it wants and can do its thing.

Here's the problem. How the hell is this automated client supposed to know how to navigate through this hierarchy?

The only (non-AI) possibility I can see is that Jane must embed some sort of knowledge that she has as a human into the code. This knowledge simply isn't available in the information that the REST documents provide.

Maybe Jane will browse the site and figure out that these regions exist, and hard-code the client to follow them. Her client could have a list of names to look for in order: "North America", "USA", "Illiinois", "Chicago".

If the names changed and the client couldn't find them in the REST documents, it could throw an error and Jane could tweak the client. A sufficiently flexible client could allow her to set this "name chain" in a config file. Or maybe the client could use regexes so that some possible changes ("USA" becomes "United States") are accounted for ahead of time.

Of course, if Jane is paying attention, she will quickly notice that the URIs in the JSON documents happen to match the URIs in their browser, and she'll hardcode her client to just GET the URI for Chicago and be done with it. And since sites should have Cool URIs, this will work for the life of the site.

Maybe the answer is that I'm trying to use REST for something inherently outside the scope of REST. Maybe REST just isn't for non-interactive clients that want to get a small part of some site's content.

That'd be sad, because non-interactive clients which interact with just part of a site are fantastically useful, and much easier to write than full-fledged interactive clients which can interact with the entire site (the latter is commonly called a web browser!).

REST's discoverability requirement is very much opposed to my personal concept of an API. An API is not discoverable, it's documented.

Imagine if I released a Perl module and said, "my classes use Moose, which provides a standard metaclass API (see RFC124945). Use this metaclass API to discover the methods and attributes of each class."

You, as an API consumer, could do this, but I doubt you'd consider this a "real" API.

So as I said before, I suspect I'll end up writing something that's only sort of REST-like. I will provide well-documented document types (as opposed to JSON blobs), and those document types will all include hyperlinks. However, I'm also going to document my site's URI space so that people can write non-interactive clients.

Cross-posted from House Absolute(ly Pointless) - permalink.


ORLite has this problem

Alias on 2008-11-02T14:20:56

I've been finding I have something of this problem with ORLite, which is intended to give you a reasonable full ORM in one line of code.

The problem is that it isn't documented like regular code is documented. It just lets you call stuff based on the database structure, and I'm finding that very very annoying.

Apart from automatically generating POD for the documentation, I'm stuck at this point.

Re:ORLite has this problem

autarch on 2008-11-02T19:14:22

Alzabo did something similar, and I did write a complete POD generator.

However, it's a little different in this case, because it's generating an API based on a database you know about, so if the rules for API generation are predictable, knowing what the API will be is simple.

If you're writing a client for a REST service, you only know what the providers of the service tell you.

Something about roses and names

stu42j on 2008-11-03T16:48:34

Just take the ideas you like about REST, build your API as you see fit and call it something else. POX/HTTP, or whatever.

Re:Something about roses and names

Aristotle on 2008-11-03T18:30:12

As long as you understand what you lose by giving up constraints imposed by REST, that’s perfectly fine.

Re:Something about roses and names

autarch on 2008-11-03T22:08:43

AFAICT, I like everything about REST but the "start at the root and discover" rule. The rest of it, including what Roy said in his original blog post about using document types vs documenting "parameters, seems very sane.

But for me, non-interactive clients are the key target of a web-based API. I think browsers do an acceptable job of providing an interactive client already.

Re:Something about roses and names

Aristotle on 2008-11-05T08:41:33

What I said stands: if you understand what you lose by giving up that rule (a lot – much more than naïvely obvious), go ahead and give it up.