Code Smells In Test Suites

Ovid on 2008-01-29T12:27:14

Test suites need to be treated carefully. There are three golden rules for them:

  1. They must be run constantly.
  2. Tests must never fail.
  3. They must only output test information.

Items #1 and #2 are generally taken as a given. Item #3, however, is equally important and frequently ignored.

When I first started at the BBC, our team had over 9,000 tests, coverage in the mid 80s (percentile) and all tests passed. Rarely have I had the luxury of stepping into such an environment. However, the test suite was very noisy. Lots of diagnostic output, Perl warnings, and a user-generated warnings about some database tables being changed when they shouldn't be. These are code smells. Maybe nothing is really wrong, but how can we know without looking into things?

The diagnostic output was easy to fix. Much of this was in the form of "explanations" to programmers running tests. By reading these comments in verbose test output, you can get a better sense of what the tests are doing. I argued that these comments should be pushed into tests, but jplindstrom pointed out that many of these comments weren't really appropriate for tests as they explained the intent of subsequent tests. As a result, when we switched from Test::More to Test::Most, we started using the explain function. This is a special form of &diag which skips the call to &diag unless the tests are being run in verbose mode.

Some people would argue that extra diagnostics aren't a code smell, but when you have too many diagnostics, you can get used to seeing non-test information and it's easier to overlook real problems, such as warnings.

I went through and fixed a lot of the warnings (most were harmless), but never had a chance to get all of them. As a result, the "Use of uninitialized value in string" warnings thrown by our database migration tests were ignored.

Recently I wrote some database migration code and ran those tests directly. All tests passed, but I decided to fix the warning. The basic structure of the code looked like this:

my ($dbname, $host, $port) = ($1, $2, $3);

my $dump_cmd;

{
    no warnings 'uninitialized';
    $dump_cmd = "mysqldump -h $host -P $port -u $user -p$pass --no-data $dbname"
            . ' | grep -v Database | grep -v Server';
}


my $migration = DBIx::Migration->new({
    dsn      => $dsn,
    dir      => $dir,
    username => $user,
    password => $pass,
    connect_options => $db_conf->connect_options,
});

$dsn =~ m/dbi:mysql:dbname=(.*);host=(.*);port=(.*)/;

How the hell did the regex for those back references wind up being after assignments from $1, $2 and $3? As it turns out, this happened a long time ago. Our tests were passing because we were using mysqldump to generate diffs, but because the arguments were undef, mysqldump was returning an error message which the tests always successfully diffed against itself. Fixing this bug revealed that the tests should have failed many months ago. We now have some cruft in our database as a result. It's not too bad and doesn't affect the integrity, but it does mean that if we ever had to migrate the database back down, our code would have broken.

Today I'm going to examine the warnings about the database tables being changed when they shouldn't be. I have less confidence in this test suite than I did. There probably isn't a problem here, but when a test suite is noisy, that's a serious smell which should be examined.


stdout v stderr

jmason on 2008-01-29T14:27:07

I tend to use "classic" Test, which can still do the "silencing extra diagnostics" trick; just output the extra diagnostics on STDOUT, and keep STDERR for serious stuff (and perl warnings, of course).

when run from "make test" or "prove", the STDOUT output is hidden; when run from "make test TEST_VERBOSE=1" or "prove -v", or run directly from the commandline as "./testname.t", the STDOUT diagnostics become visible.

works great ;)

Re:stdout v stderr

Ovid on 2008-01-29T15:05:44

I tend to use "classic" Test ...

Do you mean the ancient Test.pm? If so, you're missing out on all of the improvements made in the testing world over the years. Test.pm should only be there for legacy support.

If you must use that interface, at least consider using Schwern's Test::Legacy module. This will allow you to keep that interface but still use other testing modules. Of course, that changes your diagnostic output and breaks what you've described. However, diagnostics are a small, small price to pay for using the rest of Perl's test modules :)

Re:stdout v stderr

jmason on 2008-01-29T15:31:54

yeah, I do mean that ancient module. and I know there are many goodies available if I was to catch up ;)

In SpamAssassin, we have a project philosophy of attempting to keep our CPAN dependencies at a minimum, and we support perl 5.6.1 as a platform. even Test::More didn't arrive until 5.6.2, unfortunately...

I should really use Test::More for my other modules, though.

'Of course, that changes your diagnostic output and breaks what you've described.'

really? are you saying the newer test modules remove this (very useful) behaviour? that seems a pity!

Re: Test::More and backcompat

markjugg on 2008-01-29T18:55:42

In SpamAssassin, we have a project philosophy of attempting to keep our CPAN dependencies at a minimum, and we support perl 5.6.1 as a platform. even Test::More didn't arrive until 5.6.2, unfortunately...
Assuming you have a compatible license, you can do what the darcs project does: Ship a private copy of Test::More with the test suite suite. This eliminates problems between your test suite and different versions of Test::More as well.

Adding to the output though...

jjore on 2008-01-29T18:36:04

I've been foraging for good ideas in Squeak/Smalltalk lately. One nice thing is that test failures have a stack saved so I can pop up the relevant process in the debugger/IDE wherever and whenever things fail.

I've been considering how well this could map to perl - whether a test failure should save the stack so tools could know exactly what was around. I'm not sure whether Perl is powerful enough to ever make use of this information. ATM, it's speculation.

Re:Adding to the output though...

Ovid on 2008-01-30T09:47:32

You know, you might want to chat with Andy Armstrong. He's been toying with the idea of creating a reversible debugger for Perl. As I understand it, it's similar in concept to how video works: occasionally save the state (a "key frame") and save then a bunch of diffs until you need to save the state again. With that, you could just note the "time" of the test failure and persist its state. You could then reverse to a previously known state, change parameters, run things forward again, etc.