The CPAN Top 100 website has been upgraded to CPANDB

Alias on 2009-07-09T06:21:12

Completing 2 weeks of fairly heavy refactoring, rewriting and debugging of various toolchain elements, I'm happy to report that I have now removed the (faulty) CPANTS dependency graph from the CPAN Top 100 website and replaced it with the highly accurate CPANDB dependency graph.

This is huge relief for me, because all the highly visible bugs that were preventing the concept from progressing to full maturity.

The change to reliable data has meant a few major changes to the results.

1. The falsely inflated scores go away.

CPANTS seemed to have an issue misattributing modules to the correct distribution they came from. For example, it thinks that DBD::SQLite lives in the DBD-SQLite-Aggregation distribution.

Fixing this has caused a big shuffle at the top. "perl" now sits proudly at the top of the table, all the hardcoded exceptions are gone, the counts have zoomed up 10% as better linking was able to be made, and a very surprising #2 position that I originally thought was a bug is confirmed to be correct.

Yes, a text wrapping module is the most depended on thing in the CPAN! :/

2. Author attribution is now correct.

The CPANTS implementation appears not to understand the concept of changing maintainers. As a result, many CPAN modules that had changed hands were showing as being maintained by the original author.

With this corrected, you should now be able to accurately see who you need to annoy to take over a failing module.

3. DBD::mysql is no longer an OMGIBROKECPAN event.

Granted, the PASS vs FAIL counts for DBD::mysql are horrible. But the graph now does NOT consider the mysql driver to be as important as it previously did, which has dropped the FAILure score for it down to a lower one.

4. A new module appears at the top of the FAIL 100

I don't know where it was hiding (presumably as an embedded copy in someone else's distribution), but IO::Zlib has made a surprise jump to the top of the FAIL 100 table.

Please commence the poking and prodding of the author to take it over or fix it :)

What's next?

With the underlying data now correctly in place, I can start to address the extremely simple nature of the website itself. I wanted to delay any more work on creating new metrics until the current ones were no longer broken.

So hopefully I can now begin adding new Top 100 tables, and start to move the website towards something that will work better as a standalone site.


Next

Aristotle on 2009-07-09T14:15:54

Maybe top100.cpan.org?

Re:Next

Alias on 2009-07-09T14:49:23

Something like that, when it's ready.

#1 RTed

LaPerla on 2009-07-11T08:01:17

The bug in IO-Zlib is identified in https://rt.cpan.org/Ticket/Display.html?id=47793

The result is more boring than you'd hope but at least we know what we have to do. Thank you, Adam.

The problem with your metric is that you also measure the time how long a distro is sitting on CPAN because the number of fails will tireless increase even if the fails are irrelevant. Maybe divide by number of received reports?

Re:#1 RTed

Alias on 2009-07-15T12:16:34

I'd like to think that this creates a larger incentive to get rid of false negatives.