A couple of years ago I started building feed aggregation sites (aka "planets"[1]) using Plagger. Once you've installed Plagger, it's pretty simple to configure new planets.
Notice I say "once you've installed Plagger". Plagger is one of those CPAN modules which installs about half of CPAN. The CPAN dependencies page currently gives you a 26% chance of successfully installing Plagger.
The reason for that is clear. Plagger is an incredibly powerful tool. It is an all-purpose tool for slicing and dicing web feeds. It also includes special-case plugins for creating web feeds for many sources that don't currently publish them.
So the net result is that Plagger is large and often hard to install. And most people (or, at least, I) don't use a fraction of its power. I wish I didn't have to install all of Plagger's dependencies in order to just grab a few feeds and combine them into a planet site. Perhaps the Plagger team needs to look at breaking up the distribution into smaller parts that do less.
[Update: Miyagawa points out that the vast majority of Plagger's dependencies are optional, so I'm overstating the case here. Sorry about that.]
Recently I moved a lot of my sites to a new server. And I really didn't want to go through the pain of installing Plagger. Therefore all of my planets have been dead for a couple of months. And that has been bothering me. But a couple of days ago (prompted by something that Paul Mison wrote) I decided to do something about it.
I a few hours of spare time I wrote Perlanet[2]. It doesn't do anything at all complex. It reads data from a YAML file, uses XML::Feed to parse the feeds and then the Template Toolkit to generate the web page (oh, and XML::Feed again to generate the combined feed).
It's very simple. And it's new, so there may well be bugs. But it's there if you find it useful. It's already powering a revived planet davorg. My other planets should come back to life over the next few days.
I'm sure I'll be adding more features over the coming weeks, but the main point is to keep it simple.
[1] After Planet the Python software that (as far as I know) first popularised this method of aggregating data.
[2] Yes, I know it's a terrible name. But it seemed obvious to me and now I can't shake it.
Re:Thanks, links question
davorg on 2008-03-11T15:16:21
This is the data duplication issue that Paul mentions in the article I link to. You're seeing the same data from two different sources.
I've been using the delicious "daily blog posting" tool to wrap up a day's links and add them as a single entry to my blog. But the planet includes both the blog feed and the raw delicious feed. So delicious links appear twice on the planet.
I'll going to turn off the delicious daily posting tool.
I hope to be able to use this someday. I've been needing such a tool but on a lower scale than plagger. Unfortunately I'm so busy with $NEW_JOB I'm not sure when I'll get to try it out.
Re:Validation
davorg on 2008-03-12T08:58:09
Ah. I'm been concentrating so hard on getting the HTML page valid[1] that I'd forgotten to look at the feed.
I'll apply the patches from RT locally and wait for the next release of XML::Feed to fix the problems.
[1] Often a pointless task on a planet as so much of the markup is out of your control.Re:Validation
blech on 2008-03-12T09:58:43
husk.org's (X)HTML isn't valid mainly because Vox use lots of attributes, and their visual HTML editor can get quite confused about wrapping spans. At some point, I'll either start running the HTML through Tidy before presenting it, or I'll edit the raw HTML before saving it to Vox.
Similar attributes cause issues with the Atom feed, but they don't prevent validation, merely cause warnings, because Atom does at least recognise the concept of extensibility.
Re:Plagger dependencies
davorg on 2008-03-12T08:49:12
Plagger's "requirement" modules are all generic, from Cache, DateTime and LWP to HTML, XML and URI fetch modules. They are all necessary to all kinds of data sources and output as well.You're absolutely right (of course!) The vast majority of the pre-requisites are optional. I had forgotten that.
I still think, however, that you're bundling too much stuff together. If I was bundling Plagger I'd create a core distribution that just read feeds, combined them and published new ones. I'd then relegate all of the extra CustomFeed, Filter, Notify, etc. plugins to other (optional) distributions.
Probably this is just a philosophical disagreement on how software is bundled
:) Re:Plagger dependencies
miyagawa on 2008-03-12T09:17:19
Right. Splitting the distro into separate piece of modules, or at least to bundles was the original idea.
The only reason why we haven't done this was probably because we had never got to the point "yes we're done", and I just didn't want to see people uploading their random Plagger plugins to CPAN that will eventually be unmaintained, abandoned, out of sync with core, and in a poor quality code etc, etc.
It doesn't take you a minute to name a few Catalyst modules that are "out-of-date" or "was a total mistake, don't use that" state.
But yes, at this point when Plagger main trunk dev is so quiet, we can strip most plugins from the core, and probably rethink more "recommended" modules to smaller set.
Re:failed to install
davorg on 2008-03-12T09:12:24
I'll suggest from this experience that the dependency chain of XML::Feed is similar in complexity to that of Plagger from this perspectiveThe dependencies checker indicates that XML::Feed has a 44% chance of installing correctly whereas Plagger has a 26% chance. Of course, Plagger uses XML::Feed so Plagger never going to be easier to install than XML::Feed.
I had exactly the same problem with XML::LibXML that you did. My server is a Fedora Core 6 system and I install all of my modules using rpm (building my own when necessary). The version of XML::LibXML available for FC6 is 1.62 and XML::Atom (which is required by XML::Feed) needs at least 1.64. I solved the problem by downloading the source rpm from the Fedora 9 development repository and rebuilding it on my server. Of course, having root access makes that a bit easier
:-) It's great that people are interested in Perlanet, but I need to emphasise that it is a two-hour hack that I threw together to solve a particular problem that I had. If other people find it useful then I'll certainly clean it up and make it as easy as possible to install, but currently I don't have the time to space.
The likelihood of success calculated by cpandeps only takes into account the mandatory dependencies (those in 'requires' and 'build_requires') and ignores those that are merely recommended - taking recommended modules into account would in fact *reduce* the chance of success. It's also worth noting that the link you give is for the dependency tree and test results when using latest version of perl (5.10.0) and for any operating system. It's always a good idea to change the filters to match your perl version and OS.
Additionally, it uses test results for the very latest stable version of all the non-core dependencies. This means that if something needs, for example, Foo::Bar version 2 or higher and you've got version 2 and it works, but version 3 exists on the CPAN and fails horribly, you'll get misleading results.
I would like to provide some kind of interface on the web site to let users fiddle with module version numbers at some point - patches would be most welcome and may attract payment in tasty beverages
Re:Optional extras
rurban on 2008-03-12T16:52:40
I just enhanced the probability of Plaggers cygwin deps from 0% to 30% by doing a cpan5.10.0 Feed::Find with installed CPAN::Reporter. This had 2 FAILs, though it works fine for me.