Copy & Paste Detect

TeeJay on 2003-03-13T12:07:24

Once again I found a nice article in onJava that I really want in perl - the Copy and Paste Detect application.

This would be *so* useful in perl. We have a project with quite a lot of heavily repeated sql and perl that could do with refactoring but currently works and is not worth refactoring by hand.

CPD would help greatly especially if we can also look for things like map in void context and undocumented / commented functions or functions without specified prototypes, etc.

Now I just need a perl or similar tokenizer, a module providing a nice Greedy String Tiling or similar algorithm and we will be away.

urls:


PMD

chromatic on 2003-03-13T17:46:23

I was interested to run across PMD, the software on which the Copy/Paste Detector runs. Though it's not possible to do that sort of static analysis in Perl, there are ways to do something very similar. I've resurrected my long-dormant bytecode-to-XML project.

PMD uses treecc to build an AST, then uses the Visitor pattern to call an event for each node in the tree. From there, he looks for specific patterns of nodes. It's a little tricky, but you can build up a sort of state machine.

My code is B::SAX (no, it's not ready yet), which is like an inside-out Visitor. It throws events for each node in the tree. I just have to write a SAX handler that can check for patterns, a handler to transform bad patterns into good patterns, and a handler that mangles the output into something B::Deparse can understand.

Yeah, there's a lot of hand-waving there, but the really hard part -- tokenizing Perl -- is done.