The modern view on commenting

schwern on 2007-03-16T22:02:28

I keep writing these off-the-cuff essays to people in response to questions and I really should archive them somewhere and turn them into something. For now, I'll stick them here.

This one is where a friend of mine is learning how to program on the job. She's a biologist which explains all the gene-centric examples I use (and I apologize for my mangling of the entire field of genetics in the process). She's figuring out how and when to comment and write documentation. I answered the commenting part and largely ignored the documentation part because I was sick of writing really.

Antonia Mayer wrote: > > Would you have a perl script with exemplary documentation for me? I > > still don't have a good sense of what is too much and what too little > > comments. > > > > That is, what I've heard here on the question is my direct supervisor > > saying: I suggest you learn and use POD. > > > > And another co-worker saying: and there is no "too little comments". > > Comment every line and every regex if you will. We don't know > > whether whoever is maintaining this code after you will have had any > > experience with Perl. Any decrease of legibility of the code by > > having all these comments will be balanced by the time spent trying to > > figure out little things and changes. > > > > So, d'you have an example of exemplary documentation at hand, or any > > other advice or comments?

First off, comments and documentation are two different beasties. Or rather, there's internal documentation, usually done with comments, and external documentation, usually done with POD. They have different audiences, and its important to know your audience.

Internal documentation is for people who are going to read and modify the code after you. It might be the next maintainer, it might be you a few days from now after having forgotten how it works.

External documentation is for the user of the program. It assumes the user has no knowledge of how the program is written or of programming at all. The program is a black box. The docs explain why you want to use the program, what it does and does not do, quirks and bugs, frequently asked questions, common mistakes and how to fix/avoid them, what you pass in, how to pass it in, what gets done to it and what comes out.

Internal documentation is normally written with the assumption that the audience knows Perl. If they don't, its not your job to teach them and there are plenty of better ways to learn. Unfortunately it sounds like you can't make that assumption at your work place. However, it doesn't mean you should comment what every line is doing. It would be silly to write:

my $thing = 4; # set the lexical variable $thing to 4

Both because its cluttering, and here's the reason why too much commenting is frowned upon, if you change the line you have to remember to change the comment. And you won't remember. And then the comment will lie.

Comments should not say *what* is happening and *how* it is happening but rather *why* it is happening. The what and how can be figured out from the code itself, and if its written properly it should be clear. But the why is not in the code, so you have to add it in with annotations. Let me show the difference.

my $thing = 4; # the length of a gene sequence

That tells you *why* you're assigning 4 to $thing, which is the important thing. Now it is possible even to eliminate the need for annotative comments, you can make the why implicit in the code. This is done with good use of names.

my $gene_sequence_length = 4;

That needs little or no explanation (except, maybe, why you set it to 4).

Good variable and subroutine names largely eliminate the need for comments and help you to structure your code better. Let's say you have a dense series of regexes which transforms a sequence from type A to type B (I'm making shit up).

# Transform from type A to type B $sequence =~ s/A+/B+/g; $sequence =~ s/gafoooty/feribble/; $sequence .= " this is the end my friends.";

You can eliminate the need for the comment AND structure the code better by moving that block into a subroutine.

sub transform_from_type_A_to_type_B { my $sequence = shift;

$sequence =~ s/A+/B+/g; $sequence =~ s/gafoooty/feribble/; $sequence .= " this is the end my friends.";

return $sequence; }

$sequence = transform_from_type_A_to_type_B($sequence);

That might seem like more code, but when reading a big wad of code its nice to be able to just look and say "oh, they're transforming from type A to type B" instead of reading 4 lines about how you're doing that. A literary analogy would be:

He drew a square. [1]

[1] See appendix 5 for details.

vs

He decided to draw a square. He grasped his pencil firmly about the middle and pressed the sharp, black end against the paper with enough force to leave a mark but not to tear tender white skin. His arm lashed upwards leaving a straight black line! Then a turn to the right and another! Another turn, another line! Finally, his masterpiece nearly complete, he dragged the pencil back to its starting point closing the cycle like an old man after a full life returning to the womb of death.

Even regular expressions can be named.

my $non_amino_acids =~ qr{[^AGCT]+}i;

$sequence =~ s/$non_amino_acids//g;

Finally, sometimes you write code in a non-obvious way and you need to write down why you did it that way and what its doing. Usually its because 1) you're working around a bug 2) its faster to do it the non-obvious way 3) you're making use of some little known feature.

This is all intermediate level stuff that you usually work out after you've got a decent grasp of the language, but its nice to keep in mind for now. At the point you're at, work on getting your names right and err on the side of too many comments.

As for good examples of external documentation... prove isn't bad. http://search.cpan.org/user/petdance/Test-Harness-2.64/bin/prove

dbiprof is another http://search.cpan.org/user/timb/DBI-1.54/dbiprof.PL

They're both a little scanty and assume the reader already knows about testing and profling, respectively. Probably assumes too much.

Test::Simple's documentation is pretty good, even though its a library and not a program. http://search.cpan.org/user/mschwern/Test-Simple-0.70/lib/Test/Simple.pm




When you don't need comments

Ovid on 2007-03-16T22:14:31

A perfect example to illustrate one of your points: I was working on some code when I ran across several instances of a variable named $number. Since this was one of our ubiquitous several hundred line subroutines that we are cleaning up, it was very, very frustrating to see such a useless variable name. After a bit of hunting through the code, I finally found the following:

# $number is invoice number
if ( defined $number ) {
    my @results = $dbh->selectrow_array(...);
    ...

Had they just had the sense to name that $invoice_number, much of the code in the subroutine would have been dead obvious. Of course, the fact that something which is clearly immutable (either you have an invoice number or not) showing up repeatedly in code in non-obvious ways is another sign that there are serious confusions in thinking about how to program.

That being said, nice essay. More people should read it.

Commenting example

pemungkah on 2007-03-16T23:07:43

I'd like to recommend the debugger's comments and documentation (even if I did write them - http://ibiblio.org/mcmahon/perl5db.html for the formatted POD and http://ibiblio.org/mcmahon/perl5db.pl for the raw source). I worked very hard to make them work both as an integrated whole (the POD summarizes, the comments detail).

Yes.

Skud on 2007-03-16T23:55:58

I keep writing these off-the-cuff essays to people in response to questions and I really should archive them somewhere and turn them into something.

Yes, you should.

Re:Yes.

Aristotle on 2007-03-17T01:25:46

I think that’s called a blog, or something… ;-)

Language as a guide towards intuition

Aristotle on 2007-03-17T00:25:32

Nice writeup, and well put.

Another point that I would have made is: often, novices structure code in weird ways, with procedures that aren’t very useful outside the context in which they were conceived because they are big and do too many things. They use variables that get recycled over too many intermediate calculations, or conversely fail to break calculations into intermediate steps saved in variables.

Naming is a good detector for such problems. If you can’t think of a good precise name for the variable you just declared, it’s probably because it spans several computations that should be stored in separate variables. If you have trouble giving a function a good precise and concise name, it’s probably because you made a mistake in your separation of concerns.

This is a great general principle. “If you can’t give it a name, it’s probably because it’s the wrong abstraction.”

With experience, you get a better sense for which operations to break out, and in how many distinct parts to break them out. If you have a strong sense for program structure and naming, comments become almost completely unnecessary. Nowadays, nearly all comments in my code are related to bugs or shortcomings – explanations of why the code has to do this peculiar thing to avoid such-and-such problem, FIXMEs, the works.

naming arrays after their members

mr_bean on 2007-03-17T06:19:57

When I name an array after the objects it contains, I don't know whether to use the plural, or the singular, ie whether to call it objects, or object.

@persons means I start saying $persons[0], $persons[2]. @person reads nicer when referring to the elements, $person[1], $person[2].

But the second might confuse a reader of the code?

Re:naming arrays after their members

grinder on 2007-03-17T11:05:56

No don't name it after the members. Name it after the collection. Thus, the array becomes @person. When read out loud, it becomes "the person array".

This avoids ever having to ask the "is it singular, is it plural?" question. Name all your variables in the singular, even if, in some corner cases it comes out sounding a little weird.

If everything is always singular, no exceptions, you never have to stop and pause to think about its name. This optimisation pays off big time in the long run.

Re:naming arrays after their members

Aristotle on 2007-03-17T13:33:40

My array and hash names are always singular, but not my scalars. When a scalar is intended to store an arrayref, I use the plural, and when it’s intended to store a hashref, its name ends in _for.

I agree otherwise, though.

Re:naming arrays after their members

grinder on 2007-03-17T13:59:07

Can you sketch an example?

I can't see how it can be a win over a blanket "no plurals" regardless of type and/or content. But you're no dummy, so I'd like to if I can be convinced :)

Re:naming arrays after their members

Aristotle on 2007-03-17T14:46:26

It’s basically the same as people who like to append _ref or _r to the names of their reference-storing scalars. The only difference is that with this scheme, the code reads more like plain English.

Re:naming arrays after their members

Aristotle on 2007-03-17T13:37:53

For arrays and hashes, I always use the singular form. The plurality is already implied by the sigil – putting it in the name as well would be repetition. Note also that there are many cases where you use the collection as a whole, which read nicer with a singular name, such as push @person, $last_record (OK, I suppose that can be argued either way) or keys %person_name.

foreign language variable names

mr_bean on 2007-03-17T08:00:55

I read in a software engineering textbook about a South African software company that went broke, because its code had been written by Portuguese speakers, who used Portuguese for their variable names. The Portuguese programmers left the company, and were replaced by Afrikaans-speaking ones, who couldn't maintain the code, because they couldn't understand Portuguese.

I found this hard to believe, but I've never had the experience.
The textbook writer, from South Africa, said names must be in English.

I was thinking about the problems of non-English speaking programmers in connection with jonasbn's journal

      Linkname: Presentations, Papers Articles and Jokes
      URL: http://use.perl.org/user/jonasbn/journal/32678

I have problems remembering the differences between 'split', 'splice', and 'slice', because the names are so similar. Do people who don't know those words have fewer problems, or more?

Having a Bulwer Litton perl documentation contest

hfb on 2007-03-17T03:49:13

would be an awful lot of fun. :) "It was a dark and storm night in front of the green glow of the monitor as I snacked on pop tarts and contemplated what I am about to write to you....". One of my Engligh teachers in high school was his great-great granddaughter which made for a lot of fun at times. :)

Thanks!

Shlomi Fish on 2007-03-18T17:37:08

Hi Schwern!

Thanks for taking the time for writing this and thanks for sharing it with us. I couldn't have said it better myself.<tm>.

yes, please.

gabrielle on 2007-03-20T18:16:29

"I keep writing these off-the-cuff essays to people in response to questions and I really should archive them somewhere and turn them into something."

What can I do to help make "Dear Schwern:" a reality?