Hungarian Notation Rocks!

Ovid on 2005-06-16T17:26:08

Yeah, I'll bet that's a journal title you weren't expecting from me. Most Perl programmers object to Hungarian notation for a very simple reason: if we wanted a stronger type system, we'd use a language that provides it. We, for some strange reason, demand the ability to add apples to IP addresses, never mind that we wouldn't actually do such a thing. Thus, you're not going to see this anytime soon:

$strFoo .= $strBar;

Of course, the counter-argument by Hungarian Notation proponents is this:

// this looks fine
result = bar + foo

// this does not
intResult = intBar + strFoo

That just looks wrong and it probably is. Hungarian Notation, therefore, can potentially allow one to spot bugs without having to go back to the type declarations, or worse, scan through every assignment. There are, of course, a variety of problems. If you have to change the type of a variable, finding all instances of it can be miserable. Not only do you have to find everywhere it's used and ensure that it still does what you mean, you also have to change the variable name everywhere. Further, adding all of those prefixes was just makework that really didn't give you a good sense of what the variables were for. Just because two numbers were floats didn't mean you could add one to another.

I want a stronger type system, but not based around data types. I want want based around the domain of values that a variable can have and how it is to be used. To a certain extent, I think many people appreciate the value of this. If a number must be prime, we write &is_prime and throw the number at it. If it must be negative, we write if ($num < 0) { ... }. Of course, this can be very error-prone. We're testing the data after the fact. While this is sometimes necessary, if a particular data domain is necessary and we forget the test, bugs tend to be the result. Objects, being custom data types, allow us to have a bit more control.

# perl 6
my Prime    $prime .= new(5);
my Negative $neg   .= new(-2);

# fails if Prime is correct
try { $prime.val = 57; }

# later
multi method quux(Prime $val)    {...}
multi method quux(Negative $val) {...}

And with that, we can establish a proper domain for a variable and have somewhat less worry about using it incorrectly. It still doesn't tell us how the variable should be used, but it's still a nice, generic way of specifying a useful data type. Which brings us back to Hungarian notation. I can't name every variable $prime or $neg. However, what would happen if we saw the following?

$prmResult = $pntFoo + $prmBar;

Assuming that we understood that "pnt" represented a Point and "prm" represented a Prime, we might get the idea that adding them might be useless. We get a hint that something is wrong by looking at a single line of code. In this case, the notation is not telling us the type, but giving us a clue as to the domain and, possibly, the usage. In fact, the variable $prmBar is probably poorly named (for the "prm", not the typical "Bar"). "prm" doesn't tell me how that variable is to be used, but "pnt" is more useful (ignoring that we don't know how many dimensions there are.) Further, if we decide to later change the data type of a variable but not its usage, the prefix could still be meaningful and provide useful information about how it should be used. (Heck, I'm constantly changing data structures and types without changing the names.) And, according to Joel Spolsky, this was the original intention of Hungarian Notation.

--

A smattering of extra thoughts that should probably be footnotes if I weren't lazy and really wrote this up properly instead just tossing it off quickly with long, run-on sentences that annoy the heck out of you:

It's interesting that, years ago, there was a foreshadowing of the power of dynamic typing coming from, of all places, Microsoft.
This is also why I object to some OO purists who insist that accessors and mutators are always evil and must be avoided. I often want objects that are nothing more than glorified data types. And yes, I'll cheerfully set and get values on them, thank you.
I also don't want to suggest that Hungarian Notation should be used everywhere. Certainly not! But the wider the scope of a variable, the more a programmer will appreciate a sane name that gives a good indication of what it's used for.
And if you object, read the Spolsky article, first :)

HN is a kludge

jjohn on 2005-06-16T18:18:14

It's for users of primative languages without sigils on their variables. Poor sods. They never know what a variable is without hunting down the declaration.

Sigils don't tell you much

runrig on 2005-06-16T19:44:43
Did you read the article? It talks about two types of Hungarian notation, "Apps" (what the author originally intended and the more useful type) and "Systems" (what others misinterpreted the original type as and more like what you are referring to, but having equally useless information like "integer", "long", "char"). Sigils don't tell you if it's an index to an array, a 3-D point object, or the width of a bitmap.
Re:HN is a kludge

djberg96 on 2005-06-16T21:50:57
Knowing that a variable is a scalar hardly tells you what it is, other than a scaler. Is it a string? A hash reference? A quadruply nested array of hashes of arrays of hashes?

I guess you'll have to hunt down the declaration anyway, even in Perl.

Notation Hungarian Reverse Perl and

ziggy on 2005-06-16T18:28:32

One reason why RHN-the-lesser raises the hackles of many a Perl hacker is because we already have RHN built into the language.

At the heart of it all, Perl has only five types that matter: scalars, arrays, hashes, filehandles and coderefs. (Yes, I skipped two. If you know what they are, you know why I skipped them. ;-) Yes, there are tied variables, but what matters about them is not how their internals differ, but how they automagically hide those internals. And objects are another kettle of fish, but at the heart of it all, an object is a single piece of saved state that you sling about in your code, so it makes sense that it's a scalar value.

It's sad that filehandles don't have a sigil, but them's the breaks. At least now we can do away with filehandles as a magic kind of variable and use file handle references instead.

So when you see code like this:

$a = $b + $c; $d += $e; $f = "$g$h$i" . $j;

you know everything is right in the world. When you see something like this, on the other hand,

$a = @b; @c = $d;

you know something is amiss. Until you hit the error trap just below your conscious mind that reminds you what is special about assigning an array to a scalar, or a scalar to an array. Or a hash to a list, or a list to a hash. And so on.

I haven't hacked in Perl6 yet, so I can't say for sure whether changing the syntax like so:

$a = @b[5]; ## was $b[5] $c = %d{8}; ## was $d{8}

will ultimately have a positive or negative impact. We all think we'll be better off, and the early reports confirm that, but it'll be a couple of years before we know for sure. My hunch is that Larry is right, and changing the syntax is the right thing to do, because it meshes well with how we think about our Perl code in the 21st Century, not in 1988.

But it doesn't break the fundemental RHN-ness of sigils.

Re:Notation Hungarian Reverse Perl and

Ovid on 2005-06-16T19:24:19

The sigil is only half the battle. Joel's article has a great example of pulling in unsafe data from a Web form. Perl's taint mode aside, we can't always know when it's safe to use a variable. When I was coding CGI apps, I did something like this:
my $_name = $cgi->param('name'); my ($name) = $_name =~ /($untainting_regex)/;
In this case, the sigil tells me nothing about whether or not it's safe to use that variable. If everything untaints, I can shove that data in the database. If it doesn't, I have the original data to encode and throw back onto the Web form. That convention of the leading underscore made it trivial for me to see that $sth->execute($_name); was an error. The sigil won't.

Re:Notation Hungarian Reverse Perl and

Ovid on 2005-06-16T19:27:00

Crud, I hit 'Submit' instead of 'Preview'. Damn. Someone needs to read The Design of Everyday Things. (Including me, to be fair.)

Re:Notation Hungarian Reverse Perl and

ziggy on 2005-06-16T19:41:22

In this case, the sigil tells me nothing about whether or not it's safe to use that variable.

Right. Because Perl embeds RHN-the-lesser (the kind of stuff in Petzold's book and the Windows API). The entire point of RHN-as-intended is to use common prefixes and principles of composition to describe the meaning of a variable: taintedness, Primality, object behavior, worksafe content, etc.
The one huge problem with "Reverse Hungarian Notation", as Joel says, is that virtually everyone thinks RHN is RHN-the-lesser, when the true meaning (and value) comes from RHN-as-intended.

That's what type systems are for ;-)

Adrian on 2005-06-16T21:09:50

Assuming that we understood that "pnt" represented a Point and "prm" represented a Prime, we might get the idea that adding them might be useless.

I'd much rather use a language where I can sensibly declare points and primes as different types and get a nice compile time or runtime error if I try and add them.

Getting humans to do things that compilers can do seems counterproductive now we have the chance to use vaguely decent languages ;-)

Re:That's what type systems are for ;-)

lachoy on 2005-06-17T01:59:11

Getting humans to do things that compilers can do seems counterproductive now we have the chance to use vaguely decent languages ;-)

+1

Re:That's what type systems are for ;-)

ziggy on 2005-06-17T14:01:17

I'd much rather use a language where I can sensibly declare points and primes as different types and get a nice compile time or runtime error if I try and add them.

I'd much rather use a language that handles these kinds of strict typing issues for me and provides sensible error messages when a type error is found. ;-)
(Yes, I'm using Haskell these days. And whenever I haven't seen a type error in a couple of days, it takes a while to figure out what ghc is trying to tell me.)