CPANTS Update is a bit of a mixed bag

Alias on 2008-06-10T01:46:07

In my opinion, the addition of new metrics to the Kwalitee measurements are normally a good thing, with two primary benefits.

Firstly, they inform less experienced authors of factors they may not have considered.

The best example of this is the "Makefile.PL is not executable" test. This is important, because of the little known effect that ExtUtils::MakeMaker uses the instance of Perl that invokes Makefile.PL to determine where (by default) the module should be installed.

Keeping Makefile.PL as non-executable forces people to declare the Perl installation where the module will be installed, rather than just running the script and having it installed to god only knows where.

The second main purposes of CPANTS is for authors with large numbers of modules (like, say, me). Most of these people are under often severe time constraints when it comes to working on their CPAN modules, so CPANTS provides an effective centralised tracking system for keeping an idea on all the things we know we should be doing, but don't have time to monitor.

So for me at least, metrics like "easily_repackagable" or even "distributed_by_debian" are at least extremely useful, even if some of them don't necessarily contain an obvious connection to quality.

However, the problem with CPANTS in general, and the latest batch in particular, is the high vulnerability to poor metrics.

A combination of natural competitiveness, herd instinct, perceived authority and observer/audience effects combine to make CPANTS an ideal breeding ground for cargo cults. You can already see how many of the CPANTS metrics have modified the behaviour of many authors.

Certainly, the usage of things like Test::Pod::Coverage has probably been driven primarily (in quantity terms at least) by the perceived need to tick that box (or cheat) in order to get a module to "100% Kwalitee".

And these cargo cults have potentially disastrous results, caused either by their mere existence or by a bad implementation.

Take, for example, the use_warnings metric.

While it is all well and good to encourage the use of warnings.pm, the CPANTS metric is grossly inadequate because the use of warnings is limited to 5.6 or above.

As a result, it is impossible for a module with a 5.005 or older Perl version dependency to legally meet the criteria of the metric.

The author is forced to a) unnecessarily increase the version dependency b) modify the module adding an unnecesary warnings::compat dependency, or c) cheat to meet the metric.

While this metric may help encourage new authors to do the right thing, it fails to assist major authors because there's no way to distinguish between modules failing for good reasons and modules failing for bad reasons.

Likewise, modules with no bugs reported in Debian because they aren't in Debian at all fail the "no_bugs_in_debian" test, when clearly they should pass.

I actually approve of this sort of metric as a way of helping people keep track of all the downstream distribution channels in a more centralised way that currently available.

Of course, the description of how to fix the error that you aren't in Debian is stupid, because it encourages people to go annoy the Debian packagers into packaging their modules. A better way to fix these errors might be something more like "Be so awesome that the Debian community demands your module is packaged".

Other Kwalitee metrics are bad purely because of the cargo cult effect.

The amount of authors that have bloated their dependencies because they make people run POD tests at install time is the best example of these bad effects.

What we probably needed was a complementary kwalitee metrics that says "does_not_require_test_pod" to make sure these tests only get run during automated or release testing.

Likewise, adding a kwalitee metric to dictate the use of Test::NoWarnings is almost certainly a cargo cult nightmare in the making.

If everyone were to start complying with that matric in exchange for the negligible reduction in warnings, we would add unneeded dependency bloat to every single module on the CPAN, increase FAIL rates across the board (because Test::NoWarnings doesn't work everywhere) and expose a risk of any minor mistake by the Test::NoWarnings author expanding into a full blown OHNOIBROKECPAN situation.

And this isn't one we can just only use during automated testing, it would be imposed across the board. Add to this that currently, almost nobody uses Test::NoWarnings, so this is also a good example of authority trying to "pick a winner".

And we know from experience with government-sponsored technology efforts that whenever government (authority) tries to pick winners (for development grants and what not) it almost always fails.

Metrics for the usage of things like Test::Pod and so on should almost certainly emerge from pre-existing trends where there is little to no uncertainty.

Any successful regular metric needs to be one that we would anticipate eventually moving to 100% pass without undue ill effects.

Downstream metrics like "is_a_prereq" or "distributed_by_debian" are useful, but perhaps should be isolated in their own "world domination" section.

These metrics don't necessarily show the quality of a module, but they do help show that a module has become sufficiently dominant in some area that it is being used on a widespread basis.

While certainly helpful and probably at least loosely correlative with quality, these goals are not ones which 100% of the CPAN can reasonably achieve. We should firewall them off from the main ones.

One-dimensional vs multidimensional metrics

mr_bean on 2008-06-10T04:38:53

Okay I biased that straight off. Instead I should have said uni-dimensional versus multi-dimensional.

Kwalitee is uni-dimensional. It's a checklist and kwalitee is the sum of the boxes checked. Looking at the total, we put into a supplementary position information about which of the boxes were checked.

There is something to be said for this. If without background knowledge, you want to compare 2 arbitrary modules quickly, you don't want to have to understand what each of the individual dimensions are.

Just tell me how much kwalitee it has!

On the other hand, having a unidimensional metric (like money is) means we end up arguing Bill Gates is the greatest programmer because he has the most of it.

So let's have both uni-dimensional and multidimensional metrics. Let's have a report card, where a number of similar checkboxes (eg about POD) constitute one dimension and others (eg the world-domination ones) constitute another.

And let's argue about how to weight these dimensions in one overall kwalitee metric!

Re:One-dimensional vs multidimensional metrics

Alias on 2008-06-10T07:20:32
The problem with this is that it merely offloads the work from the server to the (wetware) client. It makes it easier to write the Perl metrics code, at the expense of making it more difficult to understand the result.

No Kwalatiee

phillup on 2008-06-10T12:14:33

And these cargo cults have potentially disastrous results, caused either by their mere existence or by a bad implementation.

Take, for example, the use_warnings metric.

While it is all well and good to encourage the use of warnings.pm, the CPANTS metric is grossly inadequate because the use of warnings is limited to 5.6 or above.

As a result, it is impossible for a module with a 5.005 or older Perl version dependency to legally meet the criteria of the metric.

The author is forced to a) unnecessarily increase the version dependency b) modify the module adding an unnecesary warnings::compat dependency, or c) cheat to meet the metric.

Maybe what we really need is the equivalent of:


no warnings 'uninitialized';

where the programmer can set a flag to turn off items... or in this case, check them off the list.

So, perhaps something like this:


no kwalatiee 'pod';

could go into Makefile.PL or some other appropriate place.

The point is to bring the items to the programmers attention. This allows to programmer to acknowledge the issue in a manner they feel is appropriate.

Just an idea...

Re:No Kwalatiee

Alias on 2008-06-10T23:36:06
This is the same silly pattern that we got from Perl::Critic and a number of other systems.

They excuse the mistakes of the automated scanner by forcing every programmer everywhere to bloat their programs out with instructions to the flawed scanning systems.

I much prefer the idea of making the automated code as smart as possible first.
Re:No Kwalatiee

Ron Savage on 2008-06-11T02:43:40
Not "Just an idea", rather "Just a bad idea".

I'd prefer Kwalitee to be invisible unless the author explicitly enables it.

For those of us who don't play the Kwalitee game, the result is an /apparent/ comment on the quality of our modules, and hence on our reputations.

Nasty. Deliberately nasty? Perhaps, although I certainly hope not.

And yes, I understand perfectly that it's a good idea to encourage author to care about their code. So it's effectively a side-effect of Kwalitee I'm complaining about.

Re:No Kwalatiee

phillup on 2008-06-11T17:12:24

For those of us who don't play the Kwalitee game, the result is an /apparent/ comment on the quality of our modules, and hence on our reputations.
Hm... as a pretty heavy user of CPAN (via the web interface), I can assure you I've NEVER looked at the kwalatiee score for a module.

Your reputation is safe. (in that regard)

;-)

I could not even tell you where to find that data before googling for 'cpants' just now.

(As an aside, is that data even available from CPAN? Perhaps my brain is interpreting as an advertisement... because I don't see it.)

I thought the whole purpose of the kwalatiee system was to help authors write better modules for CPAN. (which is why I suggested the "bad idea" of giving the author the ability to tell the system to ignore a criteria)

And, as a (lowly) user, I can assure you I don't give a damn what an automated system says about "kwalatiee".

I'm a much better judge of when a piece of code is suitable for my desired use, or not. (that is what unit tests, the ones I write, are for)

Maybe what is really needed is for people to stop comparing kwalatiee scores, since they are pretty meaningless as a measure of utility or quality.

At least they are from this user's perspective.