CPANTS

acme on 2003-04-25T12:46:46

A long, long time ago, in the deep past, at YAPC::Europe 2001 in Amsterdam, Schwern presented an idea. Well, a couple of ideas. He bundled all these ideas up into one and called them CPANTS. You should probably read his synopsis. At the end of the conference he gathered volunteers and Jos came from out of nowhere and started CPANPLUS. Since then CPANPLUS has gotten much cleverer (it can now automatically send test results to CPAN testers). However, CPANTS as a whole hadn't magically emerged from the mailing lists and discussions.

At the German Perl Workshop this year I got started thinking about it with Jos, Nicholas and Thomas. Metadata is very useful. Metrics are useful. Kwalitee (note that it's not quality) is useful. So I thought for a little bit. I wrote a quick scripts to unpack all the modules on CPAN, and a couple of scripts to go through these modules. And decided the best place for metadata about CPAN would be, errr, on CPAN. And it got a little more complicated. Then Richard helped. And I ended up with Module::CPANTS.

For example, this is what Module::CPANTS knows about Acme::Colour:

  'Acme-Colour-0.20.tar.gz' => {
    'author' => 'LBROCARD',
    'description' => 'additive and subtractive human-readable colours',
    'files' => [
      'Makefile.PL',
      'README',
      'MANIFEST'
    ],
    'lines' => {
      'nonpod' => 170,
      'pod' => 95,
      'total' => 265
    },
    'requires' => [
      'Graphics-ColorNames-0.32.tar.gz',
      'Scalar-List-Utils-1.11.tar.gz',
      'Test-Simple-0.47.tar.gz'
    ],
    'requires_module' => {
      'Graphics::ColorNames' => 0,
      'List::Util' => 0,
      'Test::Simple' => 0
    },
    'requires_recursive' => [
      'File-Spec-0.82.tar.gz',
      'Graphics-ColorNames-0.32.tar.gz',
      'Scalar-List-Utils-1.11.tar.gz',
      'Test-Harness-2.26.tar.gz',
      'Test-Simple-0.47.tar.gz'
    ],
    'testers' => {
      'fail' => 1,
      'pass' => 5
    }
  };

Note that it gets the description, even though the module isn't in the modules list (only a quarter of the distributions on CPAN are), lines of code metrics, prerequisites (and recursive prerequisites), CPAN testers data, and a list of notable files. A whole bunch of metadata.

Before you start complaining: it's just the beginning. Module::CPANTS will never do all that the CPANTS synopsis mentions. No, it doesn't have many metrics (I don't want to evaluate all the Perl code on CPAN, for one). And the interface is, well, a really big hash. Adding new metrics is easy (you may need a little disk space to test it ;-). Patches are welcome. The kwalitee() method is uncoded as yet, because I think most people want to weight metrics in different ways. I plan on releasing a version of Module::CPANTS every week, so it's only ever a couple of days out of sync with CPAN, although you can shout "database" and "web service" at me if you want.

Is this interesting? I think so. There are a whole lot of extra metrics that could be added. And if we set up a secure sandbox of some sort so that it's safe to evaluate all that Perl code, a couple more metrics. What easy-to-calculate metrics influence what you think of a Perl module? Want to help out? Check out Module::CPANTS::Generator.

And no, it's not just a plan to create the largest module on CPAN...


metrics

inkdroid on 2003-04-25T14:19:02

Schwern's synopsis has so many good ideas in it. I think you are on to something starting out with something relatively simple and then moving forward with new stuff as it becomes useful.

I'd be intersted in a couple of metrics, some of which might be easy to calculate.

  • Last update: the date of the last release on CPAN, which could be used as a measure of how actively the module is being maintained.
  • Versions: a list of how many versions the module has been through to get an idea of the module's lifecycle. I don't know where this could be pulled from since modules can be deleted from CPAN. From the Changes file perhaps?
  • rt.cpan.org: bugs, feature requests from rt (I seem to remember that RT has a SOAP interface, but I might be wrong here).

As for kwalitee(), you could already generate a POD to code ratio, which Schwern has listed as a red kwalitee flag, so that could be a start for sure. This is really, really cool stuff, thanks Acme.

Re:metrics

acme on 2003-04-25T15:05:13

Thanks for the tip. It should be easy to get the time of last update from CPANPLUS. Getting the number of versions is a little harder - it should be possible to get it from backpan, but it'd be much easier to get if there was an index listing of it all. RT seems possible too. I wonder if there's a way to get at a full RT database dump...

The main problem is getting access to this data without doing a billion web lookups. For the testers information, I use the NNTP interface to the testers mailing list, which is fairly quick to query as I only need the subject of the messages. It's also easy to grab the new data. This is slightly harder for these other sources, but I'm sure I'll find a way.

Cheers, Leon

Re:metrics

jdavidb on 2003-04-25T15:27:33

Rather than pulling all the data every couple of days and putting it into a module, have you considered turning your Module::CPANTS::Generator module into the main module itself? Have it just pull the data on demand. No reason to go pull all the information from the "Meta" module each day if noone is ever going to use it.

Re:metrics

acme on 2003-04-25T15:46:31

I did consider that, but I figure the meta information is import enough to have before you go to the trouble of downloading the module. For instance, it's nice to know prerequisites for a module without having to download the module and its prerequisites and their prerequisites. This way you get metrics for everything on CPAN without having to unpack the whole of CPAN. Disk space is cheap, but not everyone has the bandwidth and a couple of gigs free.

Want a Web Service?

cwest on 2003-04-25T18:01:44

Here it is then.

Now, I just need to find someone to maintain it. Being emailed to acme for safe keeping.