Improving CPAN's RSS feed?

kellan on 2003-03-01T17:23:58

So who would one talk to about improving CPAN's RSS feed of recent uploads?

It seems silly that at the very least the feed doesn't include the short descriptions visible on the recent upload page.

From there we could go on to a feed that included the full pod, pointers to the Changes and README files, the related modules info, etc...

We've got the metadata, lets use it.

Writing this journal entry probably took as long as writing a scraper would have, but hopefully it will lead to better things.


feed

belg4mit on 2003-03-01T18:57:17

File feedback on search.cpan.org.
I'm all for including the line from the SYNOPSIS.
But embedding the full pod seems a little unnecessary.

well...

hfb on 2003-03-01T19:56:35

I can't remember if I wrote that script or if graham did, though I suspect I did since it's readable and contains more than 10 lines :), but it hasn't changed much over the last few years...back when far fewer people bothered to put a synopsis in and search wasn't as good at extracting them. I don't know that the description would really be beneficial in an RSS feed since it's often long and just as vague and unenlightening as the name might be.

Re:well...

kellan on 2003-03-01T21:07:55

Just to clarify, you do think adding the synopsis is a good idea? Great. You need any help? We've got a spiffy new version of XML::RSS out that might be useful.

Re:well...

hfb on 2003-03-01T21:59:28

I didn't say it was either. The thing is that the 'synopsis' doesn't come with the distribution rather in the pod of each individual module contained in the distribution therefore making the attachment of the 'synopsis' to the distribution is a tricky and, at times, misleading task. It's a bit of a mismatch of information that works most of the time. Aside from that, I'd like to think that the brevity of the austere module listings in RSS is an encouragement to authors to name their modules well and descriptively or at least outrageously to get people to fancy a look. Technically there is no 'metadata' in distributions but, then again, we still have trouble getting people to bother to put in version numbers. So, I don't know that it's a bad idea....just not a terribly important one. If you want the extended descriptions, there's the /recent page as I tend to think RSS feeds work best when they're very brief, just enough to get interest or not. Graham may have oodles of free time though so he may decide that it's a fun project :)

Keep in mind that with so many free-range modules, few things are trivial projects when trying to apply anything to all or most of them.

an ETag perchance?

kellan on 2003-03-02T04:08:42

Can't say I really understand your logic except for the part about "not a terribly important", that makes sense.

I don't know anything about the way search.cpan.org is implemented but it seems to me that somewhere there is code that has access to all the data we're talking about. I suspect this because the /recent page exists. Not sure why that code can't spit out the RSS.

But I take your word for it.

So I called it an early night (its too cold to get excited about being out late) and spent the last hour getting re-acquainted with TokeParser.

And I've got my next request, can we get an ETag on the /recent page, so that I can do a screen scraper with condtional GETs?

Hope, as they say, springs eternal.

Thanks for the feedback. 'night.

Re:an ETag perchance?

hfb on 2003-03-02T08:14:35

the RSS code is not part of the shiny new rewrite, it's old and only got written because pudge pestered me about getting a newsfeed for the modules. Because the db is set up the way it is, this seemingly simple task isn't simple, which is why I pointed out that a distribution does not have 'metadata' that can be reliably used in this type of situation. People say we, cpan, should enforce such things or that perl6 will miraculously change this but my hope has waned. Just because it's perl6 doesn't mean it'll force authors to add information....including most who gripe about cpan and those who should know better who regularly forget and are sloppy now. I don't like that the summaries on the recent page are not part of a greater plan instead of being forced to grab them out of the pods.

You're the first person to bother us about this and I really have to admit that I just don't care much about the RSS feed, especially when there are other more pressing things to do. We have 2 more boxes from sun that I'd like to get online so that all the people crawling the hell out of the main box won't slog it so badly.