How to look inside every CPAN distribution

dagolden on 2008-12-12T21:03:55

Ever wonder how many CPAN distributions use Module::Build, or Module::Install? Or how common the different ways Changes, ChangeLog, Changelog, etc. show up in a distribution? Or how many distributions use a particular expression in code?

I wrote App-CPAN-Mini-Visit and the visitcpan tool to make answering such questions a lot easier.

It provides a command line tool that unpacks each distribution tarball in a CPAN::Mini repository, visits the resulting directory, and executes another program. For example, listing all distributions that contain Build.PL:

$ visitcpan --append=dist -- perl -E 'say shift if -f "Build.PL"'

AASSAD/Catalyst-Helper-Controller-Scaffold-HTML-Template-0.03.tar.gz
ABERGMAN/Alien-0.91.tar.gz
ABERGMAN/File-Find-Random-0.5.tar.gz
ABERGMAN/SVL-0.29.tar.gz
...

Clearly, if I were running frequent analyses, I might just use CPAN::Mini::Extract and keep all the tarballs extracted, but since I only occasionally have questions like this, I'm willing to make the space/time tradeoff.

Oh, and to answer that first question, here's how many distributions use each style of installer:

$ visitcpan -- perl -E 'if (-f "inc/Module/Install.pm") { say "MI" } elsif (-f "Build.PL") { say "MB" } elsif (-f "Makefile.PL") { say "EUMM" }' | perl -E 'while (<>) { chomp; $cnt{$_}++ } say "$_: $cnt{$_}" for keys %cnt'

MI: 2306
MB: 2675
EUMM: 11781

Looks like Module::Install has almost caught up to Module::Build. Impressive!

--dagolden


Interesting metrics...

Alias on 2008-12-13T12:30:26

MI: 2306
MB: 2675
EUMM: 11781

It would appear that "Simple pretty format that works reliably now, but has long term problems" is trumping "Theoretically more correct and more future proof but buggy in the wild in the present"...

With, of course, "reliably works everywhere" MILES ahead.

Re:Interesting metrics...

dagolden on 2008-12-13T13:01:37

There does seem to be a fad/fashion for "simple pretty formats" (i.e. DSLs) lately. M::B's OO framework is just ugly for what should be simple and boilerplate most of the time. (Maybe I should write M::B::Declare...)

I'd like M::I more if I could understand how/when/where things happen without taking hours to unravel it. ETOOMUCHMAGIC for me.

I also think the "buggy in the wild" will decrease with time as older perls are replaced with newer ones.

--dagolden

Re:Interesting metrics...

BinGOs on 2008-12-14T09:28:19

What would be interesting is the number of different versions of MI

Re:Interesting metrics...

dagolden on 2008-12-14T20:00:19

0.77: 380
0.76: 132
0.75: 205
0.74: 8
0.73: 41
0.72: 74
0.71: 105
0.70: 24
0.69: 3
0.68: 398
0.67: 244
0.65: 87
0.64: 227
0.63: 50
0.62: 12
0.61: 32
0.60: 5
0.59: 1
0.57: 3
0.56: 3
0.55: 3
0.54: 10
0.52: 4
0.51: 2
0.50: 8
0.45: 2
0.41: 1
0.40: 1
0.39: 21
0.37: 23
0.36: 134
0.35: 20
0.34: 15
0.33: 18
0.32: 5
0.31: 3
0.27: 4
0.26: 2
0.24: 1
0.23: 1
0.22: 1
0.21: 1
0.20: 1
0.19_99: 1