Want to help with the development of CPAN?

Alias on 2007-03-08T13:17:03

Last Sydney.pm someone suggested that I post up a "top 10 things I could use some help with" on my website.

It's an interesting idea, but one that would be tricky.

You see, the most important things that need to be done often block on political and communication issues.

As evidence of this, just look at how much work gets done at the big hackfests, simply because all or most of the major players are in the same room.

Communications, politics and management also have the tendency to have long-term effects. As an example, look at the current less-than-optimal state of Template Toolkit. Because Andy has been tied up with other things and TT doesn't have a succession plan for release management, TT users have in some cases been left stranded (TT hasn't installed on Win32 for a year now) despite members of the community generating patches and being willing to deal with the problems.

This is the Continuity or Death theme again.

But I digress.

While I'm not sure I have a list of 10 things that are all politics free, I certainly have one big highlight. One project you could write or help write that, I feel, would be hugely important to the future improvement of the CPAN.

And it goes something like this... (shimmer out to dream/animation sequence)

------------------------------------------------------------------------

The CPAN Open Data API

Module interrelations on the CPAN are now too complex for them to be maintained and managed by wetware alone.

There exists a number of important issues across the graph of module dependencies, such as bitrot and back-compatibility, and so on.

Most of these issues can or could be expressed programatically. Metrics could easily be developed for many of these issues.

The data required to develop these metrics is spread out over many different CPAN services, and is currently unapproachable.

Over the course of the year I hope to see every CPAN-related service exporting database dumps, most likely in the form of SQLite database. A number do this already. I'd like to see all of them doing so soon.

Access to the data is a different issue to being able to exploit the data.

With this in mind, I'd like to see the following.

1. A unified schema that ties together all CPAN-related datasets, implemented using (most likely) SQLite.

2. A pre-built CPAN module that would provide a convenient ORM layer over the top of this schema. This would most likely be done using something like DBIx::Class.

3. A second CPAN module that would pull data from the CPAN index and every other CPAN-related system that publishes datasets, and munge them together to create the SQLite database containing all the data.

------------------------------------------------------------------------------

You can see the beginnings of my attempt at this at http://svn.phase-n.com/svn/cpan/trunk/CPAN-Index/.

Unfortunately, I'm a DBIx::Class newbie and I really haven't the time to work on this anywhere near as much as I would like.

Why is this so important?

By having these three elements in place, it enables not just access to the CPAN data, but CONVENIENT and PROGRAMMATIC access to it.

It means that anybody that would like to develop a metric for, say, bitrot (using age of the last release and the number of bugs reported and such) can do so fairly easily.

And from there it would be fairly trivial to throw a Catalyst of ttree website on top of those metrics to create interesting new services.

Imagine a sort of "CPAN's 10 Most Rotten Modules" website which lists the modules that are both the most rotten AND have the most other modules depending on them.

These sorts of tools would allow us to focus maintenance efforts where they are most needed, which is something that will be increasingly important as we head for 20,000 modules.


I just need the free time

brian_d_foy on 2007-03-08T18:49:38

This is something I'd like to work on, as soon as I have a lot of free time. My interest is getting at the data and populating the databases.

Bravo

DAxelrod on 2007-03-08T19:39:32

This is exactly the kind of project I was anticipating with my very first journal entry.

I would love to help out with this. Unfortunately, I cannot promise much in the way of tuits in the near future. However, at very least, expect me to publish some thoughts on the design aspects of this after I dust them off and edit them.

CPANTS & SQLite

domm on 2007-03-08T20:25:20

At GPW, I discovered the quite usefull module DBD::PgLite::MirrorPgToSQLite. Using this, it's totaly easy to generate a SQLite dump of the CPANTS DB.

As soon as I find some time to finish the setup on the new (yet again) server, you can expect a daily dump of all CPANTS data to sqlite.

WRT to DBIx::Class: I'm already using it to access the CPANTS DB, and could probably help out with creating some of the tools you're talking about. BUT: I have a big project to finish until 2nd April, plus my day job, plus YAPC::Europe, plus 2 kids and a girlfriend - so don't count on much to happen from my side in the next four weeks...

what CPAN Index would good for

markjugg on 2007-03-23T23:34:56

I'm interested to help with this as well. Last night I ran into a use case:

I was thinking about packaging up a script with the whole dependency chain so it could be easily installed. CPAN-Index could me query to see exactly what the dependency chain is.

      Mark