A while ago I needed to know which test modules were the most popular so I could bundle them in Test::Most. I was asked how I figured that out, so I figured I would post the code. I've updated the code to make it a bit more robust, but it's still heuristically based.
First, I need to extract the tests from my minicpan:
#!/usr/bin/perl use strict; use warnings; use CPAN::Mini::Extract; use Getopt::Long; GetOptions( offline => \my $offline, force => \my $force, ) or exit 1; my $cpan = CPAN::Mini::Extract->new( remote => 'http://ftp.osuosl.org/pub/CPAN/', local => "$ENV{HOME}/code/minicpan", trace => 1, extract => '/Users/ovid/code/cpanextracted', extract_filter => sub { /\.(pm|t)$/ and !/\b(inc)\b/ }, extract_check => 1, extract_force => $force, offline => $offline, ); my $changes = $cpan->run;
I'm also extracting .pm files, but they could be skipped if you don't need them. Then, I run the following, passing it the location of the extracted directory. ($ENV{HOME}/code/cpanextracted/authors/id/). This takes about five minutes to run on my machine.
Note that due to the generic nature of this code, if you have a large project, you could point this at your project directory (assuming it has .t tests) and find out what test modules you're using.
#!/usr/bin/perl use strict; use warnings; use File::Find; my $target_dir = shift or die "Usage: $0 path/to/modules"; my @tests; my $time = scalar localtime; print STDERR "Finding tests. This may take a while [$time]\n"; find( { wanted => sub { if ( /\.t\z/) { push @tests => $File::Find::name; warn $File::Find::name if /test.pl\z/; } }, nochdir => 0 }, $target_dir, ); my %times_found; my ( $current, $total ) = ( 1, scalar @tests ); $time = scalar localtime; print STDERR "Beginning analysis of $total tests [$time]\n"; # test_re must be a star because they might 'use Test;' my $test_re = qr/\bTest(?:::\w+)*\b/; foreach my $test (@tests) { open my $fh, '<', $test or die "Cannot open ($test) for reading: $!"; if ( !( $current % 1000 ) ) { my $time = scalar localtime; print STDERR "Processing $current out of $total test programs at [$time]\n"; } $current++; while ( my $line = <$fh> ) { next if $line =~ /^\s*#/; # skip comments if ( $line =~ /^\s*(?:use|require)\s+($test_re)/ ) { $times_found{$1} ||= 0; $times_found{$1}++; } } close $fh or die "Cannot close ($test): $!"; } for my $module ( sort { $times_found{$b} <=> $times_found{$a} } keys %times_found ) { printf "%-70s %6d\n", $module, $times_found{$module}; }
The results are a bit different then before because I've updated my regex to include "required" modules. Here's the new top 20, along with the number of times each was seen.
All told, I found 373 test modules used.
I am currently evaluating use of Test::Class.
Your mentioning of our violations of good coding practices in test suites at YAPC::Europe in Copenhagen, have made me think a lot about the general layout my test code.
Test::Class seem to offer something a have been looking for, namely test reuse.
But I do find it quite concerning that is does not appear on your list. Neither do I see Test::Unit, which I evaluated a long time ago - I went with Test::(More|Simple) at that time, but Test::Class does seem to combine the two approaches quite neatly, but as I wrote I find it concerning that it is not on your list.
But I guess that it does not mean that it cannot be used, it us just not used in CPAN distributions it seems - perhaps in the enterprise?
Re:Test::Class? or even Test::Unit
Ovid on 2008-09-13T20:00:09
Test::Class is number 52 out of 373. It should be much higher. It's a fantastic module. There are bits and pieces which could be changed but out of the box it's great and I highly recommend it. The test organization, test inheritance and general test code reuse gives it top marks in my book.
Test::Unit is basically abandonware and does not play well with Test::Builder. This means that if you want to use standard Perl testing modules, you can't. You'd have to port any functionality you need if you want to use Test::Unit.
Re:Test::Class? or even Test::Unit
jonasbn on 2008-09-14T06:49:25
Ah my bad I did not notice that it was a top-20.
Well I ported a 100% covering test suite to Test::Class yesterday and it does feel good. Next step is to have to other applications reuse the test suite - I can hardly wait.
On the plane today I made this a feature of my BackPAN indexer. I have all the bits to take a list of distributions and look inside them, so I just needed a worker task to count test modules instead of doing all the other stuff. It mostly works now, and I've pushed it to github. It's not like you need to care about that because I'll do the work to make the reports.
If you really wanted to, you could also look through the output files I've made previously and extract the modules used in each of the test files. They are just YAML files, so you pull out the right thing and count it.
I extracted all the used modules using Module::Extract::Use, which uses PPI to do the work. That way, I get everything the test file used.
I'm finishing up the additions to MyCPAN::Indexer to spit this out, but I'm also thinking about only counting a Test:: module once per distribution.
Re:MyCPAN::Indexer now does that too
Ovid on 2008-09-14T07:45:08
Thanks for doing this. That's great!
The other day, Schwern mentioned on the Test::More development list the other day that CPANTs also does the "once per distribution" thing. Counting the test module once per distribution as opposed to once per test program is good, but offers a different use case. It shows how widespread the dependency is, but not how widespread the usage is (both numbers being valuable, of course).
For example, if you have one distribution with five
.t programs, if one uses Test::Pod and the other four use Test::More, it's clear that Test::More is used more frequently, but also clear that this distribution depends on Test::More once, not four times :) So whichever way you count, it just depends upon which problem you want to solve.