In part of a code cleanup, we are going to eliminate unused files in a standard Perl/CGI app which has been around for 7 to 10 years. With thousands of files, my first thought was something like this hack:
#!/usr/bin/perl
use strict;
use warnings;
my @files = map { s{^\./}{}; $_ }
`find . -type f | grep -v CVS`;
chomp(@files);
my $count = @files;
my $curr = 1;
foreach my $file (@files) {
print "Processing $file. $curr out of $count.\n";
$curr++;
my $no_ext = $file;
$no_ext =~ s{^.*?([^./]+)(?:\.\w+)?$}{$1};
# Find all files | Exclude this file from list | find files which
# reference it
my $command = "find . -type f |grep -Ev '$file|text_r5_c2'|xargs grep -l '$no_ext'";
unless ( `$command` ) {
warn $file,$/;
}
}
It's pretty ugly and *nix specific, but the basic idea is this:
It seemed reasonable, but ignoring the fact that it's very slow, the obvious problem kicked in: if you have an entire section of code no longer being used, it can be a self-referential section and therefore is unlikely to show up on this list. This app doesn't have a robust enough test suite to figure this out. Time for another strategy.
A coworker suggested grepping the access logs. Now I feel really stupid since this is so obvious. If a file shows up in there, we know we probably want to keep it. If it doesn't, it merits further investigation.
Re:Linux?
Ovid on 2007-09-10T14:48:51
Thought about that, but there are problems there. First, they completely fail if anyone's been just perusing the code in vim, yes? (I've been doing a lot of that learning the code base). Also, robots such as Googlebot might access old files which haven't been linked to in a while. Of course, I'm not much of an administrator, so if I'm wrong, let me know!