Time to Rename TAP

Ovid on 2008-04-18T12:34:17

On the TAP discussion list, there's a robust debate over the future of TAP diagnostics. To a large extent, it's been pretty much settled that TAP diagnostics will be in YAML format. Currently, there are no diagnostics. You might think there are, but what you see is unstructured information dumped to STDERR. By a happy accident, you can deterministically read the file and line number, but that's it. Any other diagnostic information is effectively lost and a human has to eyeball it. People have tried a number of times to work around this, but to no avail. Thus, we need structured diagnostics. Most of you won't, but just because you are happy with plain text, someone who requires RTF isn't going to be happy with Notepad.

For the most part, we aim to ensure that the casual user sees no difference. Got that? The casual user should see no difference! Under the hood, however, you should be able to tweak that engine to your heart's content. People like me need to get under the hood. People like Yahoo! need to get under the hood (and they've already shoehorned some of their needs on top of TAP). If we really want the 'A' in TAP to stand for 'Anything', then we need to provide for this. Thus, diagnostics need to allow user-supplied keys.

That's where the disagreement starts. Schwern argues that we should have them completely free-form. We can't predict what the user needs and therefore we shouldn't try.

The counter-argument from chromatic is that if two TAP streams supply a key named 'Time' and one means 'start time' and the other means 'duration', how can a client possibly disambiguate them? Requiring identifying prefixes on the keys eliminates the ambiguity.

While I am very sympathetic to Schwern's point of view, the problem is the same problem with blessed hashes and large applications. Key collision can be very painful and difficult to debug. People with serious test suite needs aren't going to be terribly impressed when you tell them to build rich clients on unreliable keys.

The problem with chromatic's point of view is that at the end of the day, we're actually trying to attach semantic meaning to these keys and the only unambiguous way to do that is to define authorities for any prefix and this goes way beyond the scope of TAP.

I am, however sympathetic to chromatic's point of view and while I don't think he would argue for providing canonical authorities for prefixes, I think he would agree with requiring users to supply arbitrary but meaningful prefixes (in fact, I think this is what chromatic meant) for their keys or to group them under said prefix. This is admittedly ambiguous, but less so than just an unstructured mass of data. Clients can then selectively choose to use official TAP keys and known prefixes.

There was a claim that since we don't know what people will do with these, we're arguing from ignorance. I don't buy this because there is plenty of prior art in this area. Just look at TestNG, ant XML test output or any other well-used test-information structure. We can see plenty of what people need in the real world. Much of it doesn't always apply to different problem domains and hence the necessity of allowing systems to disambiguate their keys from other's keys.

Failing to provide even basic namespace safety leads to the problems you see in COBOL or PHP where you have ridiculously long function names which embed a faux namespace. It's a nightmare to work with (I speak from plenty of experience here).

If we go down the typical Perl route of being Captain of the USS Make Shit Up, how on earth can we convince those with serious testing needs that TAP is a serious protocol? It's a stated goal of ours to bring TAP to a wider audience. If we don't do this, we need to rename TAP to TAPDANCE: Test Anything Protocol Doesn't Address Needed Commercial Exigencies.

Who is the audience for TAP?

dagolden on 2008-04-18T12:53:20

For the most part, we aim to ensure that the casual user sees no difference. Got that? The casual user should see no difference!

I'm a little confused here. Who is the target audience for TAP? And are you confusing that audience with the "casual user" that installs a module from CPAN and sees lots of junk scroll by on the terminal?

The fact that CPAN module installation shows a lot of diagnostics to "casual" users is a historical accident and one that distribution packagers, whether PPM or .deb or .rpm already have addressed. (Leave aside for the moment whether or not that has other issues.)

I've argued (casually) for a while that TAP should be out-of-band with the terminal screen. At most, some high level summary should make it to screen and the rest should get saved into a file somewhere for someone to pick through if they need. That person, almost by definition, is not a "casual" user.

For namespace disambiguation, all I should need to know is what package produced the diagnostics. If test Test::Wibble uses "Time" to mean "start time" and Test::Wobble uses "Time" to mean duration, then I should be able to look that up in the documentation of each module. If one of them flouts a common convention, then by reviews and bug reports and jawboning, confusing test modules will get censured and used less.

Even if someone manually sets diagnostics from the *.t file, I'll see a producer of "main" and know that I'll have to look at the source to see what they're doing.

If you think the audience is automated tools downstream that need to know what "Time" means, then you'll need to define some standard diagnostics that people can use and have downstream tools only rely on those.

-- dagolden

Re:Who is the audience for TAP?

Ovid on 2008-04-18T15:42:27

use.perl seems buggy if I'm previewing several times. The reply below was for you.

Target Audience

Ovid on 2008-04-18T15:41:24

I can define multiple target audiences and potentially different needs they have:

Installers like CPANPLUS: Should ignore POD failures.
Smoke Testers: Should potentially alert authors to POD failures, but not fail the module because of this.
Developers (module authors): All tests must pass.
Power Users: Test timings, code coverage over time, tagged tests, aggregating multiple TAP streams from multiple projects, critical failure response and so on.

All of the above issues have cropped up multiple times. I've mentioned several times about the 2006 Google Test Automation Conference where people heavy test users were constantly talking about the need to have a robust solution of sharing and storing test information. Existing systems were often inflexible or did not provide a means to handle their particular needs. Mobile phone application developers needed ways of handling test information from hundreds of different phones running numerous operating systems. Others just needed ways of striping their test information to pull out statistics for just the sets of tests they were interested in.

TAP can address these needs, but at its core, we need to keep the simple things as simple as we have them yet still find ways of serving the other needs. For namespaces, if you have an in-house 'Customer' module and someone else has an in-house 'Customer' module, they shouldn't share the same namespace if there's any risk of sharing the test results because this breaks the 'robustness' model.

As for out-of-band information and having the rest saved to a file, that's why I keep thinking about App::Prove::State::SQLite, a simple way of storing and managing test information over time to allow Perl developers to explore what meaningful information they can glean from this. For example, tracking test timings over time could possibly allow one to flag potential performance problems with a change to the software.

Re:Target Audience

dagolden on 2008-04-18T19:43:31
My point was that you said "casual users" shouldn't see a difference. None of the examples you gave are casual users.

I think worrying about two different companies with internal "Customer" modules is over-engineering a solution that can be solved with culture instead of code. All you need to do is document that using a generic module name increases the odds of a namespace conflict. Companies that care can name their internal test modules Yoyodyne::Internal::Customer or whatever suits their fancy.

I think the TAP developers should follow a YAGNI approach whenever possible until there is a very specific, already-existing, real-world use-case to the contrary.

Don't let the lack of the perfect design stand in the way of getting something good enough out the door.

-- dagolden

Re:Target Audience

Ovid on 2008-04-21T10:12:44

Casual users: How would you define 'casual user' then? Someone who never runs tests? Then that's someone who will never see an impact anyway. I thought that the installers should be the 'casual users' because most of use use CPAN or CPANPLUS or make to install a new module. Those are the ones who shouldn't see a difference and those are the ones I view as casual users.

Namespaces: I'm rather unhappy with the 'lower-case reserved' suggestion as I think the embedded dot is a reasonable compromise, but Schwern's already made up his mind, so we're going to have to live with it. Don't get me wrong, I would love a perfect design, but I also realize we can't get one and being willing to adopt a dot delimited namespace is a compromise which at least avoids the Unicode issues and makes it trivial and unambiguous for parsers to know what a user-defined key is and what isn't.

Is this really that bad?

Alias on 2008-04-18T17:03:52

Can't we just steal the standard namespace trick of leeching off IANA?

---
line: 5
column: 6
tap.yahoo.com:
mykey: whatever
expected: foo
got: bar

Re:Is this really that bad?

Ovid on 2008-04-18T17:10:08

I think this can work. Eric Wilhelm suggested the idea excluding dots from standard keys are recommending them (I would say "require") for user keys. Users would then be recommended to define a namespace for their keys (but ".time" would still be allowed). It's not perfect because clashes are still possible, but it does minimize the problem considerably. As we see more examples of what people are doing with TAP, we can build on their work. Simply allowing an "anything goes as long as it's not lower case" doesn't seem as workable given that not all encodings have the concept of "lower case", but I think a period is pretty universal.