Fine-grained vs. coarse-grained Perl code

ziggy on 2001-11-07T21:50:28

Here's another example on how and why it's better to express your intent idiomatically instead of using a large amount of small statements to reach your goal. Consider the following snippet:

my @list = split("\t", $line);
my $name = $list[0];
my $address = $list[1];
my $serial_number = $list[2];
my $phone_number = $list[3];
my $userid = $list[4];
...

Now, it's good to see that we're using my variables here. But that's about the only good element of style here. Such kinds of operations are quite common. Unfortnately, this technique not only embeds a knowledge of the format of the data contained in $line, and does so in a brittle manner. By explicitly assigning a series of variables to specific indices of the temporary variable @list. This opens up the possibility of introducing bugs as the number of elements grows over time, or as the code is copied and pasted around (this happens more often than we care to admit).

One simple way to improve the code is to replace the assignment from explicit indicies of @list:

my @list = split("\t", $line);
my $name = shift(@list);
my $address = shift(@list);
...

There is a slight advantage here: now, we don't need to maintain (and debug) which element of @list corresponds to a particular named scalar variable; each variable is created and it's value is the next piece of data found in @list.

But we can do much better. If there are 27 variables, there must be 27 explicit assignment statements, whether we use the first style or second style here. Now is the time to note that the my builtin can create a single lexical variable as well as a series of lexical variables. It can also assign values to each of the lexical variables it creates in a series:

my ($name, $address, $serial_number,
    $phone_number, $user_id, ...) = @list;

Now, we're creating a series of variables, tied to the order of data in a list, and assigning one variable to each data element in that list. Except now, we've done it in a single assignment, and greatly clarified the intent. The data elements are coming into the program in one group, and we are treating them as a single group. Our program is now closer to matching our view of the world.

But that's not the only improvement we can make. Remember that @list variable? Why did it exist in the first place? It actually doesn't have a role in our program, but acts as a temporary variable needed to ferry values around for 27 assignments. Here's how we can get rid of it:

my ($name, $address, $serial_number,
    $phone_number, $user_id, ...) = split(/\t/, $line);

Now the program matches our intent:

  1. Convert a line of input into a series of values
  2. Assign each of those values to a specific scalar variable


Code reviews have to be good for something, eh?

vsergu on 2001-11-08T01:44:04

Are you sending copies of these to the author of the inspirational code?  I actually explained this particular code improvement to him on our walk to the Metro, and he seemed to understand what I was taking about.  It's not so easy to absorb these things from a purely spoken explanation, though.

I was thinking about introducing the idea of having a hash rather than a bunch of separate scalars, but whether it's worthwhile really depends on how many you've got.  With the 27 that you mention, though, I'd definitely go with something like

    # Before the loop
    my @fields = qw( name address serial_number phone_number ... );
    # or maybe read the field names out of the first line of the file,
    # which would make adding fields easier
    my %rec; # to hold the values for one line

    # In the loop
    @rec{@fields} = split /\t/, $line;
    # or more likely no $line, since I'd use $_

--Keith

Re:Code reviews have to be good for something, eh?

ziggy on 2001-11-08T14:35:46

Good point.

time passes

...announced to the dc.pm list. Julian said he's reading my journal, so he should spot these discussions eventually. Thanks for the hint.