The CPAN just got a whole lot heavier, and I don't know why

Alias on 2010-06-09T05:49:49

According to the latest CPANDB generation run, and the associated Top 100 website at http://ali.as/top100, something big just happened to the CPAN dependency graph.

After staying relatively stable for a long time, with the 100th position coming in at around 145 dependencies and incrementing by 1 every 3 months, the "weight" of the entire Heavy 100 set of modules has jumped by 20 dependencies in the last week or so!

The 100th position is now sitting at 166 dependencies, and perennial leader of the Heavy 100 MojoMojo has skyrocketed to an astonishing 330 dependencies. The shape of the "Jifty Plateau" is also much less distinguishable, which suggests it might be more than just a pure +20 across the board.

The question is why?

Is this caused by the restoration of a broken META.yml uncovering formerly ignored dependencies? Or has someone added a new dependency somewhere important accidentally?


A possible way to find out?

cosimo on 2010-06-09T06:34:57

You would have probably thought of that, anyway, are you keeping track in a database of the top 100 lists? Maybe analyzing differences between previous runs could lead to interesting discoveries.

Or maybe not, since you would have to have the number of dependencies for every CPAN module...

This is software engineering problems at a different scale than we're used to.

Re:A possible way to find out?

Alias on 2010-06-09T10:03:36

I do keep meaning to start logging the SQLite files, it's only around 30-50 meg per run (I do them at an interval of 1-2 weeks, because the process downloads about a gig in the process of generating the index).

Re:A possible way to find out?

drhyde on 2010-06-10T11:25:25

Keeping track of the dependencies for each module (well, for each distribution) isn't that awfully big. CPANdeps does it for current versions of modules (and it tracks what the dependencies are, not just the number of dependencies), in 63MB, simply by caching all the META.yml files.

And there are only 15,000-ish distributions. You'd only need a few bytes per distribution per snapshot-time.

Moose

james2vegas on 2010-06-09T08:32:10

added to DBIx::Class's requires in META.yml:

http://search.cpan.org/diff?from=DBIx-Class-0.08121&to=DBIx-Class-0.08122&w=1#ME TA.yml