MiniCPAN: When and How Much

brian_d_foy on 2008-01-30T17:29:39

I want to see how much of the stuff in my MiniCPAN was uploaded when:

$ perl dir_sizes.pl /MINICPAN
----------------------------------------------------------------------------------------------------
Year | Total |    Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct    Nov    Dec
----------------------------------------------------------------------------------------------------
1995 | 285KB |      -      -      -      -    2KB      -      -   93KB   16KB   12KB   44KB  115KB
1996 |   1MB |   25KB   75KB   26KB   49KB  800KB  173KB   11KB   95KB      -   44KB   53KB  335KB
1997 |   9MB |   48KB  157KB  127KB  237KB  236KB  274KB  248KB  244KB    7MB   71KB  220KB  521KB
1998 |   8MB |  224KB    2MB  493KB  554KB   36KB  631KB  968KB    1MB  327KB    1MB  896KB  193KB
1999 |   6MB |  325KB  287KB  276KB    1MB  299KB  255KB  641KB  670KB  884KB  425KB    1MB  398KB
2000 |   8MB |    1MB  419KB  855KB  630KB  286KB    1MB  532KB  662KB  544KB  466KB  487KB  889KB
2001 |  14MB |  951KB    2MB  890KB  616KB  847KB    1MB  888KB    1MB    1MB    1MB    1MB    1MB
2002 |  56MB |    1MB    6MB    3MB    3MB    3MB    3MB    1MB    3MB    8MB    3MB    1MB   15MB
2003 |  85MB |    2MB    2MB   12MB    2MB    2MB    4MB    7MB    6MB   14MB   13MB    3MB   13MB
2004 |  74MB |    4MB    4MB    3MB   10MB    4MB    3MB    3MB    7MB    6MB    3MB   16MB    5MB
2005 |  81MB |    4MB    3MB   13MB    7MB    4MB    6MB    4MB    4MB    7MB   11MB    7MB    4MB
2006 | 117MB |    7MB    3MB    4MB    8MB    8MB   10MB    5MB   12MB   10MB   21MB   20MB    4MB
2007 | 257MB |   18MB   14MB   24MB   10MB   19MB   15MB   20MB   20MB   18MB   22MB   28MB   43MB
2008 |  64MB |   64MB      -      -      -      -      -      -      -      -      -      -      -
----------------------------------------------------------------------------------------------------
Processed 14557 files in 3 seconds, 0.00021 secs/file
$VAR1 = {
          'zip' => 108,
          'tgz' => 237,
          'gz' => 14212
        };

I'm working on doing the same thing for BackPAN and CPAN, but have a couple of data collection issues to work out. I'm also working on doing the same thing for numbers of authors each month, numbers of new authors each month, and so on. When I get this worked out I'll release the code, but it's relly just File::Find and some path processing and counting.


Surface plot?

tsee on 2008-01-30T20:06:04

I'm not confinced this would make a good surface plot. Despite the way it's presented as a two-dimensional histogram, it's really only one time line and the amount of data. So what's wrong with a simple 2D plot (except the following one is ugly)? http://steffen-mueller.net/tmp/cpanuploads.png

Re:Surface plot?

brian_d_foy on 2008-01-30T20:25:06

Well, I also want to see yearly cycles, and look at slices like "December". People can make all sorts of graphs though.

Note that your graph is really "Age of distros in MiniCPAN", not CPAN Uploads. The only things there are the lastest ditros, which is why January 2008 has such a big spike. :)

Re:Surface plot?

jdavidb on 2008-01-30T21:56:26

Yeah; to me it's really cool that 8.2% of CPAN by size has been released or revised in the last month. And 38.8% of it within the last year. Most modules people are running are something that is new and fairly recently looked at, especially if it has a good test suite. :)

Of course, I'm really curious what the gigantic distribution is in September 1997 that hasn't been touched in the last ten years.

Re:Surface plot?

brian_d_foy on 2008-01-30T22:20:04

Don't get into the numbers too quickly. This is just MiniCPAN, so that excludes Perl distributions and so on. CPAN has a lot more current stuff than what shows up in MiniCPAN. It's not something to measure to tenths of a percent.