RSS and NNTP

TorgoX on 2003-05-27T11:56:15

Dear Log,

Since I wrote the program that generates the RSS feeds for nntp.perl.org and since Robert nicely set it up and everything, I've had a few people ask why it doesn't also provide the full content of each article in <item> element's <body> section. There's many reasons, but the main one is: it's too damned much bandwidth to make an RSS file that gives you the last N messages on a list, or the last M-day's-worth of messages, for useful values of N or M. The values of N or M have to err on the side of being large, so that clients that don't poll frequently don't miss out on some posts; but that meas that clients that do poll frequently (say, every half hour) still get the whole potentially huge file all recent messages.

The basic problem is that the server doesn't know exactly what messages you have and haven't yet read, and so it can't give just just what's new to you. But it doesn't have to be that way: suppose the server keeps track of this, by having each RSS client access a unique URL like http://whatever/thing_rdf_gen.pl?xyz123 where xyz123 is some unique ID for that client. The program that dynamically generates that RSS feed would show items that it didn't already show last time, and then update its little database for user xyz123 so that it would know not to show them the next time.

Or one can have a framework where each client says to the server "here's the IDs of items I've seen; now what items do you have that aren't in this set of IDs?"

Or one can have the client say "Give me the IDs of everything you have, and then I can ask for full details of everything that's new to me".

The problem is that all of these options are solutions that have existed practically forever for NNTP, and reiterating them for RSS seems really quite wrong-headed to me, like pointlessly wrapping SMTP in XML-RPC. I have no grand conclusion here, but rather three incomplete thoughts:

* An "rss2nntp" proxy CGI should be simple to produce; it'd basically be just a newsreader that dumps the new news files in an RSS wrapper. The per-user data on the server is basically just a .newsrc.

* While we're at it, an "nntp2rss" program should be simple to produce: say, as a program that polls a given RSS feed, and every time it sees a new item, posts that item's data to a given newsgroup (whether it's one newsgroup per feed, or what, is an open issue).

* The fact that these things are possible, sane, and in fact trivial to implement, suggests that NNTP and RSS are not radically different things. Protocol-wise, they are clearly different -- and that's most of what I just said. And at the basic level of items versus posts, there are some basic problems with expressive range (you can express things in an RSS item that there's no /single obvious translation for/ in an NNTP post, and vice versa). So at the technical level, there's just no relationship; they're chalk and cheese. However, at the user end of things, there is a weird isomorphy between simple typical RSS and simple typical NNTP -- so much so that I'm left wondering: How about having newsreaders (like Netscape News, for example) be RSS readers too?

Generalization: maybe most situations that suggest/allow/require a trivial protocol2protocol proxy, are situations where what should really happen is for the clients of each protocol to get a bit smarter, so that the proxies aren't needed in anything but the short-term.


And the news was...?

jhi on 2003-05-27T12:25:06

> like pointlessly wrapping SMTP in XML-RPC.

This largely describes "web services" in general.

Well, I've seen three kinds: (1) toy examples, (2) wrapping of propietary protocols into XML (they will still be closed and proprietary, mind, as long as the vocabularies and protocols are not public), and then these (3) pointless rewrappings of existing protocols/frameworks.

In (2) and (3) the only measurable effect has been manifold increase in bandwidth, and the need to have an XML parser everywhere. Not to mention that the whole RPC request-reply paradigm is ill-suited for many networking environments. Oh, joy.

If I've hurt someones feelings who think web services are the greatest thing since sliced bread, I'm sorry. I just don't see much net benefit.

Re:And the news was...?

davorg on 2003-05-27T12:35:04

I just don't see much net benefit.

Is that a pun :)

Re:And the news was...?

ziggy on 2003-05-27T16:05:15

If I've hurt someones feelings who think web services are the greatest thing since sliced bread, I'm sorry. I just don't see much net benefit.
No, that pretty much nails it. Web Services are a vast conspiracy of deep-pocketed vendors and tagheads to make themselves relevant.

There are a few benefits to Web Services, like the reinvention of IDL and "baked in platform neutrality", but there were better ways to get those benefits than XML-RPC, SOAP, WSDL, and RWSA(*) provide. For example, wrapping a proprietary protocol in XML makes it easier to deal with an existing system in 12 different programming languages without multiplying the number of buggy implementations of the existing protocol. Whether or not support for 12 or even 2 new languages is necessary or desirable is an exercise left for the reader.

The other major benefit to Web Services is that it's a visible, neutral RPC protocol. They're not especially good RPC protocols, but at least you can stop worrying about the bits on the wire when developing services (at least until you see how many bits are going on the wire).

*: Random Web Services Acronym

ETags and Last Modified?

srushe on 2003-05-27T12:26:03

Would Etag and Last Modified not prove useful? If a requesting client specified one or the other then you get the entries since the last time. If there are only a few then provide the full content, otherwise provide the headlines.

Obviously if the client end doesn't provide that information then continue with the current setup.

Regardless of all of this, congratulations. It's a wonderful resource in it's current state.

Re:ETags and Last Modified?

ziggy on 2003-05-27T16:09:17

Would Etag and Last Modified not prove useful?
A better-defined replacement for RSS would be more useful. A specification that approximates the less useful half of NNTP doesn't improve when ad-hoc extensions are added to provide one or two more NNTP features on a per-feed basis.