The CPANTS Volatile 100 index (done sooner than expected)

Alias on 2009-02-11T12:06:50

Thanks to the rather neat Algorithm::Dependency::Source::Invert, it turns out I can do a quick inverting of the dependency graph and then reuse all the same code to create the next Top 100 index.

As a result, I present way ahead of schedule the "CPANTS Volatile 100" index.

This index represents the distributions with the most downstream dependencies. Being higher on this index is not necessarily a good thing. It is not directly reflective of the popularity of any module, because if you already own one member of the index, it's easily to just make up a new module and add it as a dependency of the existing one (as in the chain of 4 IO-Compress modules you can see there).

The list more accurately represents the inherent volatility of the CPAN. That is, if the maintainer of a module is neglectful and issues a broken release, how big will the explosion be and how many other modules have you killed.

At the top of the list you can quite clearly see the set of modules with over 8,000 downstream dependent. This is the main section of the CPAN toolchain, or as Schwern and I like to call it, the OHMYGODIBROKECPAN zone.

Perhaps the most interesting thing about this zone is that there are some rather unexpected modules in there, particularly the Msql-Mysql-modules, HTTP-BrowserDetect, FreeWRL and Apache-LoggedAuthDBI ones...

CPANTS Volatile 100 8712 Scalar-List-Utils 8687 IO 8666 Data-Dumper 8650 Msql-Mysql-modules 8644 HTTP-BrowserDetect 8642 HTML-Widgets-Index 8627 Text-Tabs+Wrap 8624 FreeWRL 8621 Pod-Escapes 8619 Test-Harness 8619 Pod-Parser 8619 PathTools 8619 Test-Simple 8619 ExtUtils-Install 8619 Module-Build 8619 ExtUtils-MakeMaker 8619 Getopt-Long 8619 ExtUtils-Manifest 8619 File-Path 8619 podlators 8619 Pod-Simple 8619 ExtUtils-CBuilder 3471 Apache-LoggedAuthDBI 2954 MIME-Base64 2694 XSLoader 2597 URI 2533 File-Temp 2486 HTML-Tagset 2483 HTML-Parser 2373 IO-Compress-Base 2369 Compress-Raw-Zlib 2366 IO-Compress-Zlib 2363 Compress-Zlib 2351 libnet 2173 libwww-perl 1945 Sub-Uplevel 1940 Test-Exception 1681 version 1611 Class-Accessor 1555 Filter 1461 Storable 1324 Exporter 1279 Time-HiRes 1241 Test-Pod 1149 List-MoreUtils 1106 Class-Data-Inheritable 1030 FCGI 1018 Params-Util 960 Attribute-Handlers 952 CGI.pm 895 YAML 838 Encode 820 Params-Validate 820 Text-Balanced 818 UNIVERSAL-require 807 AppConfig 787 Template-Toolkit 783 Class-Inspector 756 Time-Local 737 Class-ISA 727 Module-Pluggable 726 IO-stringy 698 Sub-Install 696 Task-Weaken 685 Test-Tester 681 Data-OptList 678 Sub-Exporter 654 Devel-Symdump 651 Path-Class 634 Test-NoWarnings 618 MRO-Compat 598 XML-NamespaceSupport 587 XML-SAX 581 Pod-Coverage 573 Test-Pod-Coverage 525 Scope-Guard 509 IO-String 500 Tree-DAG_Node 498 Clone 495 Test-Deep 494 Array-Compare 491 Filter-Simple 485 Test-Warn 476 Sub-Name 471 TimeDate 462 UNIVERSAL-isa 460 NEXT 458 UNIVERSAL-can 456 Test-MockObject 451 Class-Singleton 441 Sub-Identify 440 DateTime-Locale 440 DateTime-TimeZone 438 DateTime 422 XML-Parser 414 HTTP-Body 408 Errno 405 Devel-GlobalDestruction 404 Class-MOP 401 Moose


I don't trust those numbers

Arador on 2009-02-11T13:58:36

Msql-Mysql-modules for example has only 50 dependencies (http://cpants.perl.org/dist/used_by/Msql-Mysql-modules), none of them particularly common ones. HTTP-BrowserDetect has only 8. There's no way those could have ended up so highly.

Re:I don't trust those numbers

rjbs on 2009-02-11T14:50:28

My own results of this kind of analysis at http://dl.getdropbox.com/u/88746/cpan-prereqs.json.txt gave Msql-Mysql-modules a (very high) score of 21530, which backs up Alias's numbers. I can look at generating detailed dumps later today.

Re:I don't trust those numbers

rjbs on 2009-02-11T15:35:37

HTML-Widgets-Index requires DBD::mysql, which is part of Msql-Mysql-modules. CPANTS thinks that HTML-Widgets-Index is very well-used, because it thinks that it is the provider of the module "Test":

http://cpants.perl.org/dist/used_by/HTML-
Widgets-Index

http://cpants.perl.org/dist/provides/HTML-Widgets-Index

Re:I don't trust those numbers

Arador on 2009-02-11T16:05:39

Likewise, HTTP::BrowserDetect depends on HTML-Widgets-Index and FreeWRL 'provides' Config.

Re:I don't trust those numbers

Arador on 2009-02-11T16:07:07

Oops, I meant of course HTML-Widgets-Index depends on HTTP::BrowserDetect.

Requires vs Recommends

barbie on 2009-02-11T17:20:01

Taking 'Tie-TinyURL' as an example, it requires 'Test', which is why it lists 'HTML-Widgets-Index' as being dependent. Except that Tie-TinyURL doesn't use 'Test' anywhere. I suspect this is a hangover for when it was ported to use Test::More, which it does use. This threw up a couple of ideas to help filter the results you're getting to something that is perhaps a little more relevent.

  1. Filter based on the 'requires' and/or 'recommends' entries in META.yml. If there is no META.yml, assume all pre-requisites are required.
  2. Filter out anything that is a Test:: module. This won't necessarily exclude all modules used in test only, but it might give a better indicator as to what is used by the library code and what is part of the test suite.

It would be helpful if authors noted the 'has_tests_in_t_dir' CPANTS indicator, but then CPANTS listing 'HTML-Widgets-Index' as the definitive source of 'Test' probably needs investigating. 02packages.details.txt.gz should be the definitive source, as used by the CPAN and CPANPLUS installers.

Re:Requires vs Recommends

Alias on 2009-02-11T23:57:01

> 1. Filter based on the 'requires' and/or 'recommends' entries in META.yml. If there is no META.yml, assume all pre-requisites are required.

I'm working on this one already, separating the weight and volatility graphs out into the different dependency layers (configure, build and runtime).

> It would be helpful if authors noted the 'has_tests_in_t_dir' CPANTS indicator, but then CPANTS listing 'HTML-Widgets-Index' as the definitive source of 'Test' probably needs investigating. 02packages.details.txt.gz should be the definitive source, as used by the CPAN and CPANPLUS installers.

This is probably the better fix for this. The CPANTS dependency data already includes a "0" distribution that represents the core, so it's not like the CPANTS data model doesn't support that. The problem looks to be that it's preferring a distribution ahead of the core for certain core-only namespaces.