The world's two worst variable names

petdance on 2004-03-07T06:32:46

I've been tech reviewing the second edition of Steve McConnell's landmark book Code Complete, due out in June. Bless his heart, he's got an entire chapter devoted to good variable naming practices. He touches on, but doesn't fully explore, two of the biggest sins in variable naming. Allow me to hop up on my soapbox.

Bad variables are all over the place. Usually it will be something like a short variable used for too long, like $n being used for the duration of an entire subroutine. The programmer might as well have been working in TRS-80 BASIC, where only the first two characters of variable names were significant, and we had to keep a handwritten lookup chart of names in a spiral notebook next to the keyboard.

Sometimes you'll find variables where all vowels have been removed as a shortening technique, instead of simple truncation, so you have $cstmr instead of $cust. I sure hope you don't have to distinguish the customers from costumers!

There have also been intentionally bad variable names, where the writer was more interested in being funny than useful. I've seen $crap as a loop variable, and a colleague tells of overhauling old code with a function called THE_LONE_RANGER_RIDES_AGAIN(). That's not the type of bad variable name I mean.

Variable naming conventions can often turn into a religious war, but I'm entirely confident when I declare The World's Worst Variable Name to be:

$data

Of course it's data! That's what variables contain! That's all they ever can contain. It's like you're packing up your belongings to move to a new house, and on the side of the box you write, in big black marker, "matter."

Even if it's a function pointer, it's data that tells the language what function to run. Even if it's undef or NULL, that the variable contains that value is significant in itself.

Variables should say what type of data they hold. Asking the question "what kind" is an easy way to enhance your variable naming. I once saw $data used when reading a record from a database table. The code was something like:

$data = read_record();
print "ID = ", $data["CUSTOMER_ID"]; 

Asking the question "what kind of $data" turns up immediate ideas for renaming. $record would be a good start. $customer_record would be better still.

I promised the two worst variable names, and I feel no fear of disagreement as I declare The World's Second Worst Variable Name to be:

$data2

More generally, any variable that relies on a numeral to distinguish it from a similar variable needs to be refactored, immediately. Usually, you'll see it like this:

$total = $price * $qty;
$total2 = $total - $discount;
$total2 += $total * $taxrate;

$total3 = $purchase_order_value + $available_credit;
if ( $total2 < $total3 ) {
    print "You can't afford this order.";
}

You can see this as an archaeological dig through the code. At one point, the code only figured out the total cost of the order, $total. If that's all the code does, then $total is a fine name. Unfortunately, someone came along later, added code for handling discounts and tax rate, and took the lazy way out by putting it in $total2. Finally, someone added some checking against the total that the user can pay and named it $total3.

The real killer in this chunk of code is that if statement:

if ( $total2 < $total3 ) {

You can't read that without going back to figure out how it was calculated. You have to look back up above to keep track of what's what.

If you're faced with naming something $total2, change the existing name to something more specific. Spend the five minutes to name the variables appropriately. This level of refactoring is one of the easiest, cheapest and safest forms of refactoring you can have, especially if the naming is confined to a single subroutine.

Let's do a simple search-and-replace on the coding horror above:

$order_total = $price * $qty;
$payable_total = $order_total - $discount;
$payable_total += $payable_total * $taxrate;

$available_funds = $purchase_order_value + $availble_credit;
if ( $payable_total < $available_funds ) {
    print "You can't afford this order.";
}

The only thing that changed was the variable names, and already it's much easier to read. Now there's no ambiguity as to what each of the _total variables means. And look what we found: The comparison in the if statement was reversed. Effective naming makes it obvious.

There is one exception to the rule that all variables ending with numerals are bad. If the entity itself is named with a number, then keep that as part of the name. A variable for the road running through town would be just fine as $route31. It would be silly to rename it as $route_thirty_one.

Finally, remember that all of these rules apply to subroutine and file naming as well. We often don't spend enough time considering file names, but that's a rant for another day.


Subroutine Names Are Comments

chromatic on 2004-03-07T07:04:09

Finally, remember that all of these rules apply to subroutine and file naming as well.

I'll say it again. "Subroutine names are comments."

Re:Subroutine Names Are Comments

zatoichi on 2004-03-08T15:22:34

Exactly! Code *should* comment itself so that the programmer has to do minimal comments.

Re:Subroutine Names Are Comments

petdance on 2004-03-08T15:26:02

Comments should explain the why, not the what.

Re:Subroutine Names Are Comments

zatoichi on 2004-03-08T16:02:43

Correct. I was talking more about sub names telling you some usefull information. For instance "sub one {" is bad but "sub update_table_one {" is better and gives you a better idea what the sub does. That makes is "like a comment".

Certainly I agree that the job of comments is to tell the "why" of the code. In the case of subs thought it does tell the "what".

How appropriate

mir on 2004-03-07T10:23:16

How appropriate to read this as I am painfully debugging as piece of code which purpose is to fill the variable... $data (but I swear it's just data, really ;--).

You know, I am starting to hate you Andy. First I was browsing through Phalanx where I found the link to the CPAN Coverage Report. It showed that XML::Twig's test coverage was quite pathetic. This lead to a couple of month of fieverishly adding tests. The subsequent release made me realize that writing portable tests is even harder than writing portable modules...

Now this:

$ grep -c '[$@%]data' Twig.pm
46

I now have to change 46 lines of code!

I hate you! Or maybe I should thank you. Adding those tests made maintenance infinitely easier. If they pass, I am quite sure that I haven't broken anything. The previous method involved passing the tests and praying, which I am not very good at.

So thank you I guess.

Even better

clscott on 2004-03-07T15:52:32

foreach my $mtval ( @hosts ){
   #30 line loop using $mtval
}

I have a coworker ( now my "team leader" ) with a fetish for naming the loop topic and various intermediate step variables $mtval.

Pronounce the first two letters individually and "val" as a word ... it's an empty val! It's not an empty value or you wouldn't be using it!

I see red whenever I read it this: If you're not going to use $_ as the loop topic then use a meaningful name!

Heh

chaoticset on 2004-03-07T16:26:12

I'm entirely confident when I declare The World's Worst Variable Name to be:

$data

Of course it's data! That's what variables contain! That's all they ever can contain. It's like you're packing up your belongings to move to a new house, and on the side of the box you write, in big black marker, "matter."

A slightly modified version of this is going into my quotes.

Re:Heh

petdance on 2004-03-07T17:14:34

Then it's not really a quote, is it?

Re:Heh

chaoticset on 2004-03-08T16:12:58

Well, I had do something with punctuation that you did with layout (due to the limitations of my quotes system):
...I'm entirely confident when I declare *The World's Worst Variable Name* to be: '$data' Of course it's data! That's what variables contain! That's all they ever can contain. It's like you're packing up your belongings to move to a new house, and on the side of the box you write, in big black marker, "matter."
While it's not exactly what was typed, I figured it would be a reasonable substitution.

If it's not, let me know, and I'll remove it.

Re:Heh

petdance on 2004-03-08T16:31:30

Oh, no, not complaining. Just thought it was amusing that you had to modify your quote to be a quote. :-)

Proof Built Right In!

krellis on 2004-03-07T19:59:36

You've built proof of your point right into your examples for the second worst variable name ever. Your renamed code is not functionally identical to the original. You've replaced $total with $payable_total when it should have been replaced by $order_total. Specifically, this:

$total2 += $total * $taxrate;

should translate to:

$payable_total += $order_total * $taxrate;

rather than

$payable_total += $payable_total * $taxrate;

Hence proving why numbered variables are EVIL!

Re:Proof Built Right In!

echo on 2004-03-08T15:01:07

It only proves that modifying code just because you think the variable names aren't right, is wrong! If it ain't broken...

PS: not that I think that $total2 is a good variable name!

Re:Proof Built Right In!

pdcawley on 2004-03-10T00:10:04

Actually, it proves that you should ideally change the variable names using a tool that will change all uses of a given variable correctly. It further demonstrates that you should have a test suite which you run each time you change a variable, and roll back the changes if the test suddenly starts failing.

Too simplistic

ethan on 2004-03-07T23:55:14

Variable naming conventions can often turn into a religious war, but I'm entirely confident when I declare The World's Worst Variable Name to be:

        $data


That totally neglects the fact that a variable doesn't just appear in the big nothing. Surely, there is context around a variable that often makes data to be a very reasonable choice.

Quite coincidentally, I have some code soon to be released that uses data all over the place:


void
raw2yellow1 (CLASS, data)
                char *CLASS;
                char *data;


and the same for similarily named functions such as 'raw2yellow2' or 'raw2green1'.

These functions convert data that comes from a CD read in raw mode into the various formats such as Red Book or so. Now it's quite clear that the stuff coming in is nothing else than data...it's a stream of bytes. It could have equally well be named 'bytes' or 'raw', but what's the point?

Of course it's data! That's what variables contain!

On this I disagree. A variable holds a value (or several of them). I reserve data for something else, namely for pieces of information encoded in some way that are meant to be modified or further processed. When using this definition you suddenly have very few variables left to which this applies. Many variables hold a state, a flag or an index or whatever. Nothing of which I would ever consider data.

Maybe you have different associations with the word 'data'. But that is not a very strong reason for forbidding this name. To many this name makes a lot of sense and is not in the least ambiguous.

Other than that the naming of variables is a real problem. I often struggle with it, especially in complicated code where I need to have quite a few around of which some refer to conceptually very similar things. All my attempts to come up with a clever naming scheme had to fail, because my mistake was to make the names too long. Ten lines after the declaration of a variable I did no longer know how to spell it. I haven't yet found a convincing solution for it (and I am quite sure that no one has). In any way, my experience has been that short names are very often bettern than long ones. So I am really all in favour of what the Linux kernel's styleguide has to say on this matter (AFAIR it suggests a maximum length of seven or eight characters per variable, or even less).

Re:Too simplistic

ask on 2004-03-08T03:36:44

That totally neglects the fact that a variable doesn't just appear in the big nothing. Surely, there is context around a variable that often makes data to be a very reasonable choice.

hear, hear!

maybe with $data =~ s/often/sometimes/ ... :-)

  - ask

So simple, its simplistic!

schwern on 2004-03-17T22:49:45

These functions convert data that comes from a CD read in raw mode into the various formats such as Red Book or so. Now it's quite clear that the stuff coming in is nothing else than data...it's a stream of bytes. It could have equally well be named 'bytes' or 'raw', but what's the point?

Given that the subroutine is called "raw2yellow" one expects raw data coming in and "yellow" data (umm... whatever that is) going out. Data going in, data coming out. Which is $data?

Better names are easily found without much nashing of teeth. When in doubt, look at the subroutine name. Its called raw2yellow. Thus: $raw and $yellow. Or $cdin and $yellow. Or even just $in and $out. All of these convey the variable's purpose better than $data, don't require much thought and, the best part, they're short because they make proper use of context to fill in the gaps.

Its so easy to go one better than $data, why even flurt with ambiguity?

Re:So simple, its simplistic!

ethan on 2004-03-18T07:26:29

There simply is no chance of any ambiguity. The data that comes out (well, it's not yellow) is a list. The whole XSUB reads:


void
raw2yellow1 (CLASS, data)
                char *CLASS;
                char *data;
        CODE:
        {
                EXTEND(SP, 6);
                ST(0) = sv_2mortal(newSVpvn(data, 12)); /* sync */
                ST(1) = sv_2mortal(newSVpvn(data+12, 4)); /* head */
                ST(2) = sv_2mortal(newSVpvn(data+16, 2048)); /* data */
                ST(3) = sv_2mortal(newSVpvn(data+2064, 4)); /* EDC */
                ST(4) = sv_2mortal(newSVpvn(data+2068, 8)); /* zero */
                ST(5) = sv_2mortal(newSVpvn(data+2076, 276)); /* ECC */
                XSRETURN(6);
        }


The fact that I use 'data' for this and analogous functions allowed me to use copy-and-paste. Alright, I could have used 'raw' everywhere with the same benefit. But the necessity for that escapes me.

Although, I just note that there is the /* data */ comment. Here the raw stream of bytes is simply broken down into smaller pieces and one of those pieces is in fact the actual data, whereas the rest is some administrative overhead to be found on a CD. So now I suddenly have 'data' for input and also showing up in the output list. This still isn't very ambiguous, but I admit it's not ideal.

Aarghuments

rafael on 2004-03-09T08:30:29

I just found in some javascript code a function that takes two parameters id and id2. And guess what, at some point in the spaghetti, it's called with only one argument.