Automating Your Source Code Formatting

Ovid on 2008-01-10T11:04:33

In a comment on jplindstrom's journal, I mentioned that I've worked for several companies where it was agreed that automatically formatting source code would be a good idea. I don't care what you come up with. Automate and I'll do it. I don't want to sit around and get into childish formatting arguments. Automate that shit and grow up. If it ain't perfect, suck it up and move on to something important. yeah, three character indents would suck, but frankly, I tell people that if this is what is required, I'll do it. I'm not going to be a damned prima donna about it. And truth be told, if you're working with even moderately competent programmers, it's unlikely they'll come up with something too insane.

That being said, we've agreed to automate our formatting at work. We're using perltidy with a little bit PPI for some rough cases.

After seeing how many rough cases there are, it looks like this is the fourth company I've been at where we've refused to automate formatting, despite agreeing that it's a good idea. I'm going to scream. (This isn't jplindstrom's fault. He's put a lot of time in this and the tools just don't allow fine-grained control)

As a counter example, take a look at some source code from the Test::Harness distribution. One of the very first things I did was create a .perltidyrc. I messed with it quite a bit and was never happy with it, but it was good enough. Honestly, while that code may be a bit complex to read at times, it's not the formatting. Eventually there was a whole team of programmers using the same automated formatting and it was one thing we didn't have to worry about.

Seriously, people. There are important things to worry about. Two space versus four space indents isn't one of them.

Update: I should explain why this is important. If you have four developers and each has a subtly different formatting style, your source control history gets cluttered and you see this:

4317   jpublic3  # if there is children then there might be series
4317   jpublic3  if ($data->has_children('Series')) {
4317   jpublic3      $links->add_link(Our::Link->new({
4317   jpublic3          rel  => 'our:series-resultset',
4317   jpublic3          href => $uri->as_string .  'children/series/',
4317   jpublic3      }));
4317   jpublic3  }

So you walk over to John Public and you start asking him about the code:

You:
I don't understand this.
Him:
OK.
You:
Well, can you explain it to me?
Him:
No. Why should I be able to?
You:
You wrote it!
Him:
No I didn't!
You:
I ran "svn blame". Your name is all over it.
Him:
Seriously, dude, I didn't write that. I don't know why my name's there.

And after a lot of bickering, you finally realize that his way of formatting code is subtly different from your way of formatting code and running "svn blame" is practically useless. If everyone uses the same formatting (automated!), it's much easier to get into digital archaeology. If you do it at the start of the project, it's a blessing. If you do it in the middle of a project, after a few days of working out last minute kinks in your formatting, just apply it to the whole code base and take the hit. With enough refactoring, eventually you'll get beyond that. And put it in a source control commit hook, if possible.


Not done yet

jplindstrom on 2008-01-10T13:57:08

Well, I will take a look at it again at the end of the sprint when we're done implementing features...

I'm sure we'll come up with something that works.

Re:Not done yet

Ovid on 2008-01-10T14:12:29

Cool. I wrote that post earlier when it sounded like there were so many tiny issues that we'd have to table the thing entirely.

X-ray vision

Aragorn on 2008-01-11T09:43:02

VCS'es should be able to ignore whitespace changes during annotate. Of course, this won't solve all the problems with the "code ownership" in the annotate, but at the very least, changes in indentation won't cause lies about who wrote the code.

Re:X-ray vision

Ovid on 2008-01-11T11:25:46

That means they would need to know which whitespace is insignificatn and which isn't (Python should be fun). For example:

- print <<    END;
+ print <<END;

You're allowed to have leading space before a heredoc identifier. However, without quotes, Perl get's really confused. There are plenty of other cases where whitespace is important when it seems like it shouldn't be. Plus, what do you do if changes to a file are whitespace only? Pretend there was no commit?

I sympathize with your idea, though. Aren't there switches you can pass to most version control systems that will give you some control over whitespace significance?

Re:X-ray vision

Aragorn on 2008-01-11T13:17:34

There is, of course, no silver bullet :-) Of the systems I've used, only Git currently supports ignoring whitespace changes when doing annotation. Most of the systems I know can ignore whitespace changes when doing an explicit diff between files or commits (or have the ability to call an external diff program), but not when annotating.

But even this limited support could amend the scenario you sketched above with:

You:
Oh, hm. Let's see if there were only whitespace changes which obscured the original author (or someone who actually changed the code).
Which makes the "$vcs blame" a lot less useless.