Test driven development by Triangulation

ziggy on 2004-06-18T23:37:20

I'm working on a project at work that feels like oh so many other projects I've worked on over the years. It involves processing a data set, slowly shaping it until it meets the goal.

Time was, I would write a whole chain of programs and iteratively morph the data from what I have to what I need. But that is a total pain in the butt. My current technique is to write a program (a set of modules, a suite of tools, whatever) that do the process in one fell swoop. It could be morphing XML or upconverting plain text into something structured. This time, it's building a cross reference for an application written in Tcl.

I've been using my current development style for a good many years now. It struck me that it's not exactly test-driven development as it is commonly preached, but it is test-driven in some sense of the word. I'm reading an interesting book at the moment about geodesy and triangulation, and it struck me that triangulation is as reasonable a description of my process as any.

It starts like this. First, you start write a small program that does some meaningful transformation on your data set. You can start with a simple little prototype program and focus on a small aspect of the data you need to convert. It doesn't need to be a perfect conversion. In fact it's better to generate something with as little effort as possible than it is to try and make the first cut perfect in some way.

Next, commit everything into CVS, subversion, or your repository of choice. Tag early and often, each time you make the subtlest change to the output you generate, or the subtlest change on how you generate that output.

Now you have a basic framework for making small steps towards your goal. At every stage, you will either change the output you generate, or change the way you generate your output. For every change you make, do a simple diff on the output from the last iteration. Either it should not change at all (you only changed how you did something, perhaps optimizing a hotspot), or all of your changes should be expected.

This is a casual form of test-driven development - it doesn't leave a whole lot of test artifacts lying around. The only thing that matters is slow and steady progress towards the goal. At each stage, you have your last-known-good state and your next state. Often, they will not differ, but when they do change, all of the changes should be expected, and all of your expected changes should be present. Sometimes, the best recipe for checking the previous generation against the current one is a judicious use of diff, md5sum, wc -l and/or grep -c. Sometimes, the best artifacts to leave behind are the recipes for you proved everything is OK, not the test case or a test class.

What about triangulation? In this style of test driven development, you can only have one unknown. Just as you can measure a triangle by knowing two angles and the length of a side (or two sides and the included angle), you need to know that your input and output haven't changed to prove that your revised code is still at least as good as your last version. However, data does change, and that tends to throw off your comparison with a previous run.

No matter. Just re-run your new data through your last-known-good version, or a couple of versions back in the worst case. That will generate a new baseline output for you to compare your revised program against.

Along the way, you will encounter uncertanties. Bugs in data formatting that break the spec. Unexpected behaviors in your input data that were poorly documented. Expect these wrinkles in the road, because they always happen. These are the kinds of things that are best codified in a more conventional test suite for regression testing. (Even though your iterations are driven by triangulation against a last-known-good state, that does not absolve you from the need to write good regression tests.)


Same metaphor, different practice

Adrian on 2004-06-19T15:27:31

Oddly enough Beck uses the triangulation metaphor in his TDD book, but for the practices of waiting to generalise code until you have two or more examples.