Introducing Perlanet

brian_d_foy on 2008-03-11T21:13:00

A couple of years ago I started building feed aggregation sites (aka "planets"[1]) using Plagger. Once you've installed Plagger, it's pretty simple to configure new planets.

Notice I say "once you've installed Plagger". Plagger is one of those CPAN modules which installs about half of CPAN. The CPAN dependencies page currently gives you a 26% chance of successfully installing Plagger.

The reason for that is clear. Plagger is an incredibly powerful tool. It is an all-purpose tool for slicing and dicing web feeds. It also includes special-case plugins for creating web feeds for many sources that don't currently publish them.

So the net result is that Plagger is large and often hard to install. And most people (or, at least, I) don't use a fraction of its power. I wish I didn't have to install all of Plagger's dependencies in order to just grab a few feeds and combine them into a planet site. Perhaps the Plagger team needs to look at breaking up the distribution into smaller parts that do less.

[Update: Miyagawa points out that the vast majority of Plagger's dependencies are optional, so I'm overstating the case here. Sorry about that.]

Recently I moved a lot of my sites to a new server. And I really didn't want to go through the pain of installing Plagger. Therefore all of my planets have been dead for a couple of months. And that has been bothering me. But a couple of days ago (prompted by something that Paul Mison wrote) I decided to do something about it.

I a few hours of spare time I wrote Perlanet[2]. It doesn't do anything at all complex. It reads data from a YAML file, uses XML::Feed to parse the feeds and then the Template Toolkit to generate the web page (oh, and XML::Feed again to generate the combined feed).

It's very simple. And it's new, so there may well be bugs. But it's there if you find it useful. It's already powering a revived planet davorg. My other planets should come back to life over the next few days.

I'm sure I'll be adding more features over the coming weeks, but the main point is to keep it simple.

[1] After Planet the Python software that (as far as I know) first popularised this method of aggregating data.

[2] Yes, I know it's a terrible name. But it seemed obvious to me and now I can't shake it.

Thanks, links question

markjugg on 2008-03-11T14:51:58

Thanks, I like this and may use it for my own website.

I have a question about the del.icio.us integration. In one case several links were aggregated into one post. In other cases, which link got it's own post.

What's the difference? Or, why do it both ways?

Re:Thanks, links question

davorg on 2008-03-11T15:16:21
This is the data duplication issue that Paul mentions in the article I link to. You're seeing the same data from two different sources.

I've been using the delicious "daily blog posting" tool to wrap up a day's links and add them as a single entry to my blog. But the planet includes both the blog feed and the raw delicious feed. So delicious links appear twice on the planet.

I'll going to turn off the delicious daily posting tool.

Thank you

jdavidb on 2008-03-11T15:23:59

I hope to be able to use this someday. I've been needing such a tool but on a lower scale than plagger. Unfortunately I'm so busy with $NEW_JOB I'm not sure when I'll get to try it out. :)

Validation

blech on 2008-03-11T17:11:12

Unfortunately, XML::Feed and XML::Atom don't produce valid Atom feeds.

There are, however, two RT tickets (#33881, adding accessors for some required feed types, and #29684, which stops 'convert' generating empty summaries) which will help you get towards a valid feed.

I really should write up the other changes I had to make to genenerate a valid Atom feed. I think convert needs help with applying an updated date to RSS feeds, for example. There are also issues with RSS and Atom taking different values for the same element (Atom author fields are a real name; RSS author fields are email addresses, with dc:creator used for the name), so my (unpublicised) RSS feed doesn't validate at the moment.

Re:Validation

davorg on 2008-03-12T08:58:09
Ah. I'm been concentrating so hard on getting the HTML page valid[1] that I'd forgotten to look at the feed.

I'll apply the patches from RT locally and wait for the next release of XML::Feed to fix the problems.

[1] Often a pointless task on a planet as so much of the markup is out of your control.

Re:Validation

blech on 2008-03-12T09:58:43
husk.org's (X)HTML isn't valid mainly because Vox use lots of attributes, and their visual HTML editor can get quite confused about wrapping spans. At some point, I'll either start running the HTML through Tidy before presenting it, or I'll edit the raw HTML before saving it to Vox.

Similar attributes cause issues with the Atom feed, but they don't prevent validation, merely cause warnings, because Atom does at least recognise the concept of extensibility.

Plagger dependencies

miyagawa on 2008-03-11T19:41:36

While I understand the general pain to install Plagger, I should argue a little bit about that "Plagger dev team need to look to slim down the distro" statement.

http://search.cpan.org/src/MIYAGAWA/Plagger-0.7.17/META.yml

Plagger's "requirement" modules are all generic, from Cache, DateTime and LWP to HTML, XML and URI fetch modules. They are all necessary to all kinds of data sources and output as well. Other "plugins" are all pure perl and you can just skip installing the dependencies, which are disabled by default anyway for most plugins.

Believe me or not, I just replaced my work laptop with new MacBook a few weeks ago, and I installed Plagger and its dependencies with "notest" option of CPAN shell and by typing "n" (don't install) to most "recommended" dependencies, and it was so quick and easy, like done in 10 minutes with no failure at all.

Besides that, good work on Perlanet. You can steal options and filters from Plagger :)

Re:Plagger dependencies

davorg on 2008-03-12T08:49:12

Plagger's "requirement" modules are all generic, from Cache, DateTime and LWP to HTML, XML and URI fetch modules. They are all necessary to all kinds of data sources and output as well.

You're absolutely right (of course!) The vast majority of the pre-requisites are optional. I had forgotten that.

I still think, however, that you're bundling too much stuff together. If I was bundling Plagger I'd create a core distribution that just read feeds, combined them and published new ones. I'd then relegate all of the extra CustomFeed, Filter, Notify, etc. plugins to other (optional) distributions.

Probably this is just a philosophical disagreement on how software is bundled :)

Re:Plagger dependencies

miyagawa on 2008-03-12T09:17:19
Right. Splitting the distro into separate piece of modules, or at least to bundles was the original idea.

The only reason why we haven't done this was probably because we had never got to the point "yes we're done", and I just didn't want to see people uploading their random Plagger plugins to CPAN that will eventually be unmaintained, abandoned, out of sync with core, and in a poor quality code etc, etc.

It doesn't take you a minute to name a few Catalyst modules that are "out-of-date" or "was a total mistake, don't use that" state.

But yes, at this point when Plagger main trunk dev is so quiet, we can strip most plugins from the core, and probably rethink more "recommended" modules to smaller set.

failed to install

markjugg on 2008-03-12T03:56:06

I tried to install this today on a personal hosting account today on a shared system where I have SSH access, but did not want to install the dependencies as root.

First, I tried several alternatives for using PAR, building a packed file on my Linux laptop and uploading to the FreeBSD server. That failed in part because PAR failed to detect all the dependencies, including some of the XML::Feed namespace modules and some of the DateTime modules. It was also overly conservative about which modules it through needed to be platform-specific, so it claimed not to find some modules on the FreeBSD server, even though they were pure-Perl and already included in the PAR file, but in a Linux specific directory. I believe all the XS modules I needed were already installed on the FreeBSD server, so in theory this PAR approach could have worked.

So using PAR failed, even with a fair bit of fussing.

They I tried the private CPAN directory option, which eventually failed fatally because my version of "libxml" was older than it wanted for XML::LibXML.

Of course, using root access it would be easy, but I'll suggest from this experience that the dependency chain of XML::Feed is similar in complexity to that of Plagger from this perspective.

To make this truly easy to use, I think it would need to be distributed with something like a multi-platform PAR file that has been pre-built on Linux, FreeBSD, etc.

This isn't really a plagger-vs-perlanet issue, it's a problem with deploying Perl applications in general.

Re:failed to install

davorg on 2008-03-12T09:12:24

I'll suggest from this experience that the dependency chain of XML::Feed is similar in complexity to that of Plagger from this perspective

The dependencies checker indicates that XML::Feed has a 44% chance of installing correctly whereas Plagger has a 26% chance. Of course, Plagger uses XML::Feed so Plagger never going to be easier to install than XML::Feed.

I had exactly the same problem with XML::LibXML that you did. My server is a Fedora Core 6 system and I install all of my modules using rpm (building my own when necessary). The version of XML::LibXML available for FC6 is 1.62 and XML::Atom (which is required by XML::Feed) needs at least 1.64. I solved the problem by downloading the source rpm from the Fedora 9 development repository and rebuilding it on my server. Of course, having root access makes that a bit easier :-)

It's great that people are interested in Perlanet, but I need to emphasise that it is a two-hour hack that I threw together to solve a particular problem that I had. If other people find it useful then I'll certainly clean it up and make it as easy as possible to install, but currently I don't have the time to space.

Optional extras

drhyde on 2008-03-12T11:56:02

The likelihood of success calculated by cpandeps only takes into account the mandatory dependencies (those in 'requires' and 'build_requires') and ignores those that are merely recommended - taking recommended modules into account would in fact *reduce* the chance of success. It's also worth noting that the link you give is for the dependency tree and test results when using latest version of perl (5.10.0) and for any operating system. It's always a good idea to change the filters to match your perl version and OS.

Additionally, it uses test results for the very latest stable version of all the non-core dependencies. This means that if something needs, for example, Foo::Bar version 2 or higher and you've got version 2 and it works, but version 3 exists on the CPAN and fails horribly, you'll get misleading results.

I would like to provide some kind of interface on the web site to let users fiddle with module version numbers at some point - patches would be most welcome and may attract payment in tasty beverages :-)

Re:Optional extras

rurban on 2008-03-12T16:52:40
I just enhanced the probability of Plaggers cygwin deps from 0% to 30% by doing a cpan5.10.0 Feed::Find with installed CPAN::Reporter. This had 2 FAILs, though it works fine for me.