Narrative Testing

Ovid on 2008-06-30T10:22:20

One of the lies of agile development: tests are documentation. I keep wanting to create Acme::Tests::Are::Documentation and have the tests spit out a long POD rant about how tests usually (note that word) are not documentation.

Tests can be part of the API documentation, if done well, but they're often not. Consider the following verbose test output:

ok 50 - HTML::TokeParser::Simple::Token::Tag::Start->can('set_attr')
ok 51 - set_attr() works
ok 52 - HTML::TokeParser::Simple::Token::Tag::Start->can('get_attrseq')
ok 53 - works
ok 54 - elements
ok 55 - element order

That's awful, but that's commonly what I see in test suites. All the time, I see that in test suites. Now take a look at the actual test output I produce:

ok 50 - HTML::TokeParser::Simple::Token::Tag::Start->can('set_attr')
ok 51 - set_attr() should accept what get_attr() returns
ok 52 - HTML::TokeParser::Simple::Token::Tag::Start->can('get_attrseq')
ok 53 - ... and it should return an array reference
ok 54 - ... with the correct number of elements
ok 55 - ... in the correct order

I don't always have test output this clear, but I try. You can read my test output in a narrative style and hey, we're starting to get closer to that mythical "tests are documentation" grail.

What I want my test output to do is give the developer the ability to read the output and have a decent idea of what's going on. Sometimes test names that developers assign are so worthless that they may as well leave them off and leave us with just the numbers.

ok 50
ok 51
ok 52
ok 53
ok 54
ok 55

You remember how that used to be testing? Argh! Admittedly, at least using the function name as a test name is better than nothing, but not by much. Give the poor maintenance programmer a clue and write your tests in narrative style so they can run the tests and have an overview of what's going on. At that point, they might at least know where to look for an issue and can start reading the test code itself with a greater degree of confidence.

"tests are documentation"

fireartist on 2008-06-30T13:30:48

That's an interesting take on "tests are documentation" - you implicitly read it as "test output is documentation", whereas I read it as "test files are documentation" - meaning the contents of the .t files - they provide an example of usage, and expected results.

I try to avoid naming tests completely - it's bad enough trying to keep the *real* documentation in sync with the code, never mind a 2nd copy of the documentation, spread through hundreds of test files.

Re:"tests are documentation"

Ovid on 2008-06-30T14:20:07

Note: it's been agreed at work that reproducing BBC code here is acceptable as the public pays for this code and we really don't have "business secrets" in the sense that other businesses do.
I understand what you're saying, but there's a critical difference here. Test names are embedded in the tests and tend not to get misplaced. They're also close enough to their code that they're easier to keep in synch.
Many times I find that the tests are opaque to the point of incomprehensibility and I'm forced to read comments or the test names to get some sense of what the tests are doing. For example, one of our tests at work has the following code:
my $xml = slurp( $tva_file ); ok !import_xml( $schema, $xml, 1 ), 'imported xml'; my $change_rs = $schema->resultset('ChangeEvent'); is $change_rs->count, 12, 'found 12 change events'; my $failed_change_event_count = $change_rs->search({ status => 'failure', })->count; is $failed_change_event_count, 12, '12 failed change events';

Why do we have "12 failed change events"? Who knows? The code doesn't say and the test names can be deleted without loss of clarity. And the test file name itself? "episode_pid_shenanigans". Excuse me? Try this:
my $xml = slurp( $tva_file ); ok !import_xml( $schema, $xml, 1 ), 'Importing broadcasts without episodes should fail'; my $change_rs = $schema->resultset('ChangeEvent'); is $change_rs->count, 12, '... but there should be a change event for each broadcast'; my $failed_change_event_count = $change_rs->search({ status => 'failure', })->count; is $failed_change_event_count, 12, '... and they should be failed change events';

Now the test names are clear, self-documenting and not reliant on comments being misplaced in the code. Perfect, no. Better? Hell yes! :)
We have other tests which verify that broadcasts can have duplicate start times. Why on earth should they be allowed to have duplicate start times? The test says nothing about this. So I thought "hey, many they're on different channels?", but when I dug through their fixtures, they were on the same channel. Eventually I had to drag our acceptance tester over and he guessed that subsequent broadcasts overwrite previous broadcasts. Turns out that this is true, but there was nothing in the tests which made it clear. Perhaps these are bad tests, but if the output of the were written in a narrative style explaining the intent, this might have been easier to track down.

agreed

markjugg on 2008-06-30T18:56:38

I agree, and I like the proposed text output format. Thanks for sharing it!

But they don't really work in isolation

mauzo on 2008-07-01T03:41:45

I've been writing test names like this for a while now, but one thing that slightly worries me in the context of CPAN modules is that I get reports that look like

t/04tie.t.......ok t/05cv.t........Use of uninitialized value in addition (+) at t/05cv.t line 18. # Failed test '...correctly' # at t/05cv.t line 24. # got: '3' # expected: '6' Use of uninitialized value in addition (+) at t/05cv.t line 33. # Failed test '...correctly' # at t/05cv.t line 39. # got: '2' # expected: '3' # Looks like you failed 2 tests of 35. Dubious, test returned 2 (wstat 512, 0x200) Failed 2/35 subtests

which is not exactly the most illuminating test name... Of course, in context it said something like

ok 6 - named CV cloned ok 7 - ...not copied not ok 8 - ...correctly ok 9 - ...preserving name

but CPAN testers lost all that :(. It would be good, actually, if CPAN testers could report everything: having to go back and check out the last release and look up the line number in the test file to see which test failed is somewhat irritating.

Re:But they don't really work in isolation

Ovid on 2008-07-01T06:31:42

That's a good point. I hadn't thought about from the CPAN testers standpoint. For me, the vast majority of my testing, though, is at work either using my code or the code of a colleague. As a result, this style of testing is perfectly suited to that environment. For the CPAN, though, you are correct that you'd have to rerun the test to understand the context. However, I suspect this is mitigated by the fact that it's your module you're trying to understand and, as a result, it will be easier.
Now that I stop to think about this, though, do the terse test names in my first example work better? ("elements", for example :)

Re:But they don’t really work in isolation

Aristotle on 2008-07-01T12:05:48

I think for CPAN you want to tag your test names with a short subject, somewhat like this:
ok 6 - [cloning] named CV cloned ok 7 - [cloning] ...not a copy not ok 8 - [cloning] ...copied correctly ok 9 - [cloning] ...preserves name

Even then, narrative test names won’t always contain all necessary hints to know which one failed, but they’ll still allow you to know at a glance what aspect of the test code is affected by the breakage.

Re:But they don’t really work in isolation

mauzo on 2008-07-01T15:10:52
Hmm, maybe I should write Test::Sections, to allow code like this
testing 'named CV'; ok $cv, 'cloned'; isnt $cv, $orig, 'not a copy'; is $cv->(3), 6, 'correctly';

to produce output like
ok 6 - [named CV] cloned ok 7 - [named CV] not a copy not ok 8 - [named CV] correctly

. Test::Group looks similar, but it runs all the tests in a group as a single Test::More test.

Tags

Ovid on 2008-07-01T15:31:40

In that case, what you're really looking for are tagged tests. This is something we hope to properly support with Test::Builder 2.0.

Narrative Testing

Ovid on 2008-06-30T10:22:20

"tests are documentation"

fireartist on 2008-06-30T13:30:48

Re:"tests are documentation"

Ovid on 2008-06-30T14:20:07

agreed

markjugg on 2008-06-30T18:56:38

But they don't really work in isolation

mauzo on 2008-07-01T03:41:45

Re:But they don't really work in isolation

Ovid on 2008-07-01T06:31:42

Re:But they don&#8217;t really work in isolation

Aristotle on 2008-07-01T12:05:48

Re:But they don&#8217;t really work in isolation

mauzo on 2008-07-01T15:10:52

Tags

Ovid on 2008-07-01T15:31:40

Re:But they don’t really work in isolation

Re:But they don’t really work in isolation