URI::tag and URI::urn::uuid

miyagawa on 2006-09-26T13:18:26

In Atom feed, URI does quite an important role to guanratee the uniqueness of the feed and entry. Among others (like permalinks), urn:uuid and TagURI are frequently used to construct atom IDs.

urn:uuid:941e12b4-6eeb-4753-959d-0cbc51875387
tag:diveintomark.org,2004-05-27:/archives/2004/05/27/howto-atom-linkblog


See http://diveintomark.org/archives/2004/05/28/howto-atom-id> for more.
Perl URI module doesn't have tag and urn:uuid subclasses, which itself isn't a big deal since you can always use $uri->opaque and stuff to get the string itself to compare. Though, having URI::tag and URI::urn::uuid would be useful to programatically construct those URLs and parse some information out of them.

Hence my new modules:
http://search.cpan.org/user/miyagawa/URI-tag http://search.cpan.org/user/miyagawa/URI-urn-uuid


Thanks!

Dom2 on 2006-09-26T13:31:13

I've just been generating Atom feeds, and URI::tag is just what I need!

-Dom

Re:Thanks!

Aristotle on 2006-09-26T23:23:52

Personally, I prefer UUIDs. They make it self-evident that the ID is an opaque identifier that needs to be stored alongside the entry somewhere. Tag URIs tempt people into generating them dynamically from the permalink, which, as we all know, does occasionally change (whether or not it should).

F.ex., when I took over maintainenace of XML::Atom::SimpleFeed, there was a blob of HTTP-to-Tag conversion code in there that was used in case an explicit ID was absent. My first official act as the new maintainer was to drop that feature like a bad habit and update the docs to emphasise the concerns surrounding IDs. You now have to pass in an ID – no excuses.

Re:Thanks!

Dom2 on 2006-09-27T05:56:49

I disagree. It seems perfectly obvious to me that anything that begins with "tag:" shouldn't be resolvable. Anyway, in my case, I came on to the project when tag URIs had already been chosen. This module was just a nice way of cleaning them up.

For what it's worth, I'm only using it for generation, not parsing.

-Dom

Re:Thanks!

Aristotle on 2006-09-27T08:36:55

The question is not whether it’s resolvable. Do you generate tag URIs once and store them as a separate property of each record, or do you derive them from other properties (such as from the permalink) every time you generate the feed?

Re:Thanks!

Dom2 on 2006-09-27T09:17:36

I'm using them as part of an OpenSearch feed. Each entry is a search result. The tag id is built each time from the domain name, a fixed year, and the entry id of the search result. So yes, in theory, the domain name could change, but in practise it's unlikely enough that I'm happy tag uris are OK.

-Dom

Re:Thanks!

miyagawa on 2006-09-27T10:33:54

Practiaclly the answer would be very simple.

1) Tag URI can be construct using permalink any time from HTTP permalink, 1.1) but Tag URI should be stored somewhere as a metadata of the entry if permalink would be likely to change in the future, while 2) UUID needs to be stored as a metadata from the beginning, since otherwise regenerating the feed every time changes the UUID.

So, yeah, I agree with you that using UUID explicitly declares the necessity of storing it as a metadata, but practically TagURI is still useful anyway.

On MT/TypePad we don't build TagURI from path or filename, which would be more likely to change than the domain name. We use entry IDs (or permanent key which user generated) to ensure the uniqueness, even if they changed the title of the post, etc.

Re:Thanks!

Aristotle on 2006-09-27T16:53:26

Yah, I’m not saying you should never use them. Tim Bray uses his HTTP URIs as IDs. Any URI is valid. I’m just saying I prefer infrastructure to use UUIDs because it makes View Source programming more likely to come out correct.