Scary Perl Refactoring

chromatic on 2002-02-08T04:59:52

I first formally encountered the idea of refactoring while studying XP. I'd been doing it already (especially when giving advice to new programmers), but didn't have a vocabulary for it. As my study progressed, I learned that Smalltalk (among other languages) has a Refactoring Browser -- since refactoring can be considered mechanical transformations, why wouldn't a machine be able to do them?

At Schwern's Refactoring talk at TPC 5.0, he demonstrated the beginnings of a Perl parser that warned about dubious constructs. Parsing Perl with anything but perl obviously isn't a simple task -- Damian hasn't released Parse::Perl, though I'm impressed with perltidy.

Looking at Smalltalk in more detail made me think that operating on source code is the hard way. Working on bytecode would be much easier -- except for associating lines of code with opcodes. (I worked up a source filter that would insert a target with comments and code on appropriate occasions. This data can then be extracted as necessary.) Talking to Ned Konz and Simon Cozens, they both thought that bytecode was the right track.

Another piece of the puzzle came in writing an article about the Linux Kernel Janitors. The idea behind the Stanford Checker really stuck out. If you can demonstrate an error pattern, the compiler can look for it in the code. Obviously, I can take a bit of bad Perl, compile it to bytecode, and have a tree that marks a bad pattern. As Simon pointed out, though, searching a tree for a tree is a difficult problem.

I didn't entirely agree. Though it's usually stupid to disagree with someone that smart, sometimes it can lead to a good idea. For some reason, I thought it was doable. Walking near a koi pond one afternoon, it hit me.

The XML guys have, more or less, solved this.

Okay, the LISP guys may be able make a better case, but the important thing is that it's solvable. I thought about sending Matt Sergeant an e-mail, asking him how to take some of the rules of XPath and apply them to a non-XML tree. For a few months, the whole idea was on the back burner.

Nearly all of the pieces are in place already. There's a bytecode decompiler in B::Terse (and I knew a bit about it, having written tests for it). There's a bytecode generator in B::Generate (thanks to Simon). There's a bytecode to Perl converter in B::Deparse (thanks to a lot of people, especially Rafael Garcia-Suarez, lately). I just needed to find some way to apply something like XSLT to Perl bytecode.

This morning, it hit me. Maybe it was the Perl XML fans talking about SAX being important for more than XML, but I realized that if I could write a backend module to turn bytecode into XML, the tree matching and conversions would be solved. The only tricky part that's left is generating XSLT or XPathScript or whatever syntax to refactor an error pattern. The same XML guys who provided the final nudge can probably help out in that respect.

So now I have B::ToXML that can XMLize a code reference, and it works pretty well. If I knew the internals better, I'd be able to tell what kind of information is important. I don't yet have any way to say "go from this to this, keeping this but changing this", but I'm one step closer.

I could still be on the wrong track, but I really think I'm on to something here. Drop me a line if you have a strong opinion either way.

Bytecode scariness

pdcawley on 2002-02-08T09:50:54

The problem I had with working at a bytecode level when I was writing my refactoring engine (that I must get back to working on) was that you lose comments.

Now, I'm generally of the opinion that if your code needs comments then you have a problem right there, but some people like them...

The approach I took (and will probably continue to take) is a bastard hybrid that uses bytecode where necessary, but which still goes and mucks about with the source as strings and is generally quite scary.

But I do have the core of the extract_method refactoring working. Now I just need to fence it about with pre and post conditions above and beyond 'Make sure all your tests are still passing... you do have tests don't you?', and I've not had to resort to even using the debugger guts (but I can see that I'm going to have to for some stuff), though I have done a small amount of B::* magic.

And hey, I have code that can 'see' package lexicals. Which is nice.

Life will be *so* much easier when/if someone releases Parse::Perl. Even a tokenizer would be good.