Limits of automated testing

geoffrey on 2005-05-18T02:23:22

Recently I was doing debugging-via-email of some graphics code and it turned out that the problem became perfectly obvious with a single screenshot of the demo scene. I realized that I have a habit of designing test scenes for my graphics code that make rendering errors pop out to the human eye -- most any problem becomes obvious in a second or two. This is most definitely a Good Thing.

Of course, many readers will point out that I should have a strong automated test library to catch these sorts of issues; the user should just be able to run make test and get a nice test failure describing the problem. That's great advice, and I'd love to give the users a nice automated test suite . . . but how?

The problem here is that OpenGL is guaranteed to produce consistent output from run to run, but only for the same hardware, drivers, resolution, color depth, and so on. Change any of these, and the output is allowed to change quite drastically. Yes, there are some bounds to how far afield the output can stray -- but these bounds are, to put it lightly, pretty loose.

So how do you automate the testing, when OpenGL drivers are often buggy and sometimes vary widely from vendor to vendor and system to system? You can't keep canonical images in the testing library to compare against because the output from the developer's system is unlikely to match pixel-by-pixel on basically anyone else's system. You can't even save the output from the first test run, and compare against it in succeeding builds; if the user happens to upgrade their video drivers, or alter the desktop resolution or color depth, or make some other completely reasonable change to their system, the pixels are quite likely to be different.

I could spend a great deal of effort writing a fuzzy image matcher that determined whether two images matched within the available OpenGL guarantees -- and submit the result as a dissertation on computer vision. It's certainly an interesting subject for research, and I wouldn't be surprised to hear that SGI did exactly this to create their conformance testing library, though I've never seen it -- but it has little to do with the graphics work in front of me. I'd rather spend my time actually producing working rendering code.

This brings me back to my original method -- design test scenes that can take advantage of the human visual system by making real rendering errors pop out of the noise of absolute pixel value differences. An incorrect pixel here or there would be difficult for the eye to detect (unless the color was WAY off), but I'm looking more for "Well, clearly something's wrong with texturing -- the big wooden box is rendered as a plain black cube."

Yes, a human needs to get involved. But with proper test design any problems should be obvious -- so obvious that the user may react even before being able to verbalize the issue, because reacting quickly to seeing something wrong is a survival trait. In many cases, this process can actually be faster than running a full automated test suite for a complex system. Granted, it won't help autobuilders with automated test runs on the latest source revision, but it does have its place.

None of this is specific to graphics code, of course; that's just the area I've played the most with. These testing issues will come up anywhere you need to test interactions with the real world, where APIs often don't guarantee perfect reproduction on all systems, in all environments. How do you know, for instance, that the user can actually hear the music you are trying to play?

In the last few years I've come across quite a few books and articles espousing automated testing, and going into great detail about how to write automated tests and test harnesses in every development environment and programming language under the sun. What seems to be missing is a decent treatise on producing good user-oriented tests. Maybe I'm just looking in the wrong places . . . .

Re:

Aristotle on 2005-05-20T01:03:48

It’s always a good idea to try to structure your output into some form of regular patterns. The human eye and brain are mindblowingly capable pattern matchers; heck, we compulsively try to find patterns in noise that has none. Opting to take advantage of this is quite the opposite of a bad idea.

In any case, if you did want to use automated testing, in your case, in order to catch the most low-hanging fruit among bugs if not all of them, you will probably need to link against some reference implementation software renderer for OpenGL. (Mesa, maybe?) But that is not going to catch configuration-dependent bugs, naturally.

Re: Use Mesa as a reference

geoffrey on 2005-05-20T05:57:19

Hmmm, that's not a bad idea. With a recent Mesa, I should be able to use the off-screen renderer, which one would hope is independent of the settings of the desktop frame buffer (if not, this becomes a non-starter, I think). It does represent an extra dependency (and a non-trivial one at that), but it still may be worth it.

I guess it depends on how stable the output of the off-screen renderer has been across Mesa versions. If the rendering of existing features has been solid (and seems likely to remain so), the worries about driver updates messing with the reference images are much reduced.

Thanks for the suggestion!