The Big Bucket of FAIL (or is it?)

Ovid on 2008-12-10T16:21:04

I lied to chromatic, I think. A long time ago, when we were talking about the new version of TAP, he was concerned that many of the new features we were adding on would be not needed and therefore should not be added. I assured him that the new features would be optional and not change the meaning of core TAP.

I find myself sitting here, staring at my feet, wondering if I'm a bald-faced liar ("bald-faced"?).

So what, precisely, is a test failure? Is the following a failure?

1..1
ok 1 - Whee!  We pass!

Of course not. You have one test. It passes.

Um, not so fast there, cowboy. Now look at the following snippet of code.

END {
    unlink $highly_sensitive_data
      or die "You are soooooo fired, dude.";
}

Yes, you can test that, but you probably didn't. If the code dies (from the TAP::Parser perspective, if it exits with a non-zero exit status), then we consider that the test failed.

Um, not so fast there, cowboy.

Apparently, some languages don't give us control over the exit status (I don't know which ones, but it's a bug report we received), so we've been forced to implement a $parser->ignore_exit method.

But there's also the wait status of the process. That can be non-zero, indicating a failure. We recently fixed a bug with that. I found it while testing Rakudo and Alex Vandiver fixed it (it's in the upcoming 3.15 release. No PI for you!). Failure is tricky.

So pure TAP doesn't quite indicate if tests failed or not. Or they do, but they don't indicate if the whole test program is considered a pass.

And just to make it more difficult, consider this:

1..4
ok 1
ok 
ok 2
ok 3

It's perfectly legal to omit the test number, but it's not legal to have gaps. That's a parse error and is considered to be a failure, even if all tests have passed. This is because we can't trust that output (what does it mean?). Same thing happens if you omit the plan.

As a result, a proper "test program failed" method should look something like this:

sub failed {
    my $self = shift;
    return
         $self->failed
      || $self->parse_errors
      || ( !$self->ignore_exit && ( $self->wait || $self->exit ) );
}

Thus, pure TAP doesn't quite indicate if a test failed, but it might if we add diagnostics, but that means I lied to chromatic. Damn.

As for my App::Prove::History code, this means you have to do this to see which test programs have failed:

SELECT r.suite_id, n.name, failed, exit, wait
FROM   test_result r, test_name n
WHERE  r.test_name_id = n.id
  AND  (
    r.failed > 0
    OR
    r.exit != 0
    OR
    r.wait != 0
);

A bit clumsy, no? And I don't even include the 'ignore_exit' bit, though I might have to later.

I'm thinking about a tiny denormalization here, but in reality, I'll probably slap a view over this and see how that works.

Pop quiz: is the following a failure? Why or why not? Is the existing behavior wrong?

1..3
ok 1 - Booting
ok 2 - Got dem boots!
ok 3 - We have foobar # TODO Waiting on foobar shipment

bald-faced liar

sigzero on 2008-12-10T20:28:37

Bald-faced means blatant, undisguised

Re:bald-faced liar

Ovid on 2008-12-11T00:14:48

I know what it means :) I just don't know why "bald-faced" is the term used.

Re:bald-faced liar

Mr. Muskrat on 2008-12-12T15:03:10
It's a variation of "barefaced" (unconcealed or showing a lack of scruples) or "bold-faced" (impudent).

The exit status case...

Alias on 2008-12-11T00:04:31

The obvious case for lack of exit status is PHP, where they are streaming TAP over HTTP and the HTTP stream will just stop at some point...

And anything else doing TAP over HTTP for that matter.

Re:The exit status case...

Ovid on 2008-12-11T00:13:46

Wouldn't that just result in a 0 exit status, though? If we ignore a the exit status, we'd have to be doing that for a non-zero exit status, meaning that these don't matter.

Re:The exit status case...

Alias on 2008-12-11T08:30:22

Well, HTTP doesn't really HAVE an exit status to communicate, since it's SUPPOSED to be transactional... sort of.
The stream just stops, there's no return value as such.

BTW...

Alias on 2008-12-11T08:48:18

Pop quiz: is the following a failure? Why or why not? Is the existing behavior wrong?

1..3
ok 1 - Booting
ok 2 - Got dem boots!
ok 3 - We have foobar # TODO Waiting on foobar shipment

Asking if this is a fail is similar to asking if you run a test.

In the install context, this is a merely a curiosity. It's a difference in expected behaviour, and such for the author it's a fail(ure), but it's not a FAIL.

If you recall, the working definition of failure we used twice now for AUTOMATED_TESTING and RELEASE_TESTING is "Is the failure significant enough to forbid the user from installing".

An unused feature, documented as broken, that unexpectedly passes does not qualify as something that should prevent installation.

Re:BTW...

Ovid on 2008-12-11T09:43:06

That's pretty much spot on, though I think the existing behavior might be wrong for authors. If you're an author, I would say that the "unexpectedly succeeded" should be a fail. Otherwise, it should not. We don't want heisenfails. I want to know if code I am developing is susceptible to this, but it shouldn't cause pain for anyone else. However, just like running tests in xt/, this is something the tester must explicitly request.