YAPC day 3: Red Flags

jdavidb on 2002-06-28T17:26:23

Mark counts program lines. Some people think that's crazy. People I respect (like Watts Humphrey of SEI.CMU) think it works fine. I'm sure Mark and Humphrey would probably be at odds over what to do with the linecounts, though, because Mark's goal in this talk is to reduce linecount.

Red Flag 1: use an array instead of $X1, $X2, $X3. Hands up; how many people learned that in BASIC? :) Yet some people still don't get it in Perl. Trust me, I've seen it. I still like my predecessors, but they have caused me some amazing problems with this simple mistake alone.

my($one, $two, $three, $four ...) = ... Similar problem, only worse. Mark calls this problem "families of variables," in general.

Better to have an array with two variables (which seems silly to some people; I guess) than to repeat all the code twice. !

Mark loves talking at YAPC because we already know what map does.

Mark's final solution to a problem was the first one I thought of. Weird. He says, "Don't go with the solution you think of first." But he says if you thought of this first, you should go think of more solutions and pick the best one. Mark says Larry says the problem most people have is they have a problem, they think of a solution, and they implement it. They don't think of several solutions, analyze them, and pick the best one, which is engineering.

That's exactly what's wrong with most of the systems I maintain at work. Somebody said, "I need this data," then somebody said, "You can get it from here if you jump through twelve hoops." So we jumped through twelve hoops in code, and after a few years you're jumping through fifteen hoops because of new requirements. Meanwhile, the docs actually told us the real way to do it, which only required one or two hoops. My predecessors were smart, but our organization systematically encouraged this.

One way is not as good as another.

Interpolation with double quotes is almost always better than the dot operator. I believe this from my own experience. He also says don't put trailing slashes in scalars that are directory names, because then you can "$dir/$file" and it's perfect. My first boss taught me that. They had learned that from experience.

It's ironic that a minute ago I was complaining about code my predecessors left me, and then now I'm expressing gratitude for the useful things they taught me. Know that the overall feeling is overwhelmingly gratitude, despite the complaints I had earlier.

Mark's class is built so every piece of advice he gives has a counterexample where the wrong way from the first situation is the right way in the second situation. The idea being that you should use judgment rather than saying "MJD told me to code it like this."

People who backslash things that don't need to be backslashed indicate they are superstitious. They could go try the code and see how it works to decide if they need to do it or not. I might mention that indicates they are doing things without knowing what they are doing. As MJD said in gnat's lightning talk yesterday, "You can't just make stuff up and expect the compiler to do what you want ... why are you putting that in your program if you don't know what it does?"

Precedence superstitions. I probably suffer from some of this, although I go by the Practical Programming philosophy of multiply comes before add, put parenthesis around everything else. Nevertheless, the parenthesis in $scalar =~ (m/pattern/) are clearly extraneous. perl -MO=Deparse -e is a useful tool for curing precedence superstitions.

Mark and Uri like unless. Non-native speakers of English apparently don't.

Global variables. Bad. You knew that, right? You cannot understand the function in isolation, nor can you reuse the function in another program.

I understand there are 3 and 6 hours versions of this class, in addition to the 1 1/2 hour one I'm seeing. I'll have to arrange to see the 3 hour version some day.

The problem with some parameters is that they don't matter. That is, they don't mean anything. I think that's why I shy away from array subscripts so much. Most of my arrays aren't used like C arrays; they are collections. Sometimes ordered, sometimes not, but they fifth one doesn't mean anything different from the fourth one. [Pseudo-hashes; same problem, I think.] I foreach instead of $array[$i]. $i has no meaning. I think this makes me feel better about my use of subscripts in my recent Stratego program, because there the subscripts mean something.

Mark says say what you mean a lot. I say that to people a lot, too. Say what you mean especially in a regex. Write your code to say what you mean. Name your variables to say what you mean. And so on.

Try it. It's easier to see than to think.

Prepending 0's to a number with extreme code repetition. I've done this. I hate myself. Fortunately I haven't done it since 1998. I met a friend called sprintf.

Repeated code is boring to write. You can't keep your brain engaged and so you aren't thinking.

Mark pointed out that even if you didn't know sprintf you STILL shouldn't repeat. You can use length and the x operator, or if you didn't know the x operator you could use a while loop. So, yes, I confess my iniquities. Fortunately I've learned so much since I started programming Perl I have trouble feeling guilty for mistakes I used to make four years ago.

Backslash double quote is a red flag. I should use qq instead. Oddly, one in ten characters of HTML strings may turn out to be a backslash! It did in Mark's example. It looks like it's a lot less than that, but it's really that many.

Repressed subconscious urges will always surface: making almost global variables with huge lists of my variables to get around use strict. "Smoking in bed is bad because it sets off the fire alarm." "That's why I always hang a sheet over the bed when I smoke in bed."

Exercise: which of the strict effects is most valuable? Which is least valuable?


almost perfect

jmm on 2002-06-28T19:38:07

He also says don't put trailing slashes in scalars that are directory names, because then you can "$dir/$file" and it's perfect.

The one problem with this is that the root directory needs a trailing slash to be correct. The above approach results in $dir = '/', $file = 'foo', "$dir/$file" = '//foo' - it gets a double slash for an important special case. If you always leave the trailing slash, you can use "$dir$file" instead and $dir is correct. Using '/foo/bar/' as a directory name is perhaps a bit ugly but correct. (A doubled slash is also ugly but correct, so you choose whichever ugliness grates less.)

Re:almost perfect

jdavidb on 2002-06-29T18:08:11

Double slash makes it ugly to the OS; not having a slash between filename components makes it ugly to the programmer. I choose making it nice for myself.

Plus, how often do you need to represent the root directory in a variable? Do you write many temporary files there or anything? Or permanent files? :)

unless

koschei on 2002-06-29T15:54:20

Intriguingly I am a native English speaker and I love unless. Maybe I'm the counter-example =)

Re:unless

jdavidb on 2002-06-29T18:09:26

Whoops. Make that non-native speakers of English.

Re:unless

koschei on 2002-06-30T14:11:39

=) That's a bit more believable. Were statistics and studies quoted? =)

Re:unless

jdavidb on 2002-07-01T14:00:26

No, just a quick survey at one of MJD's training sessions.