In part of a code cleanup, we are going to eliminate unused files in a standard Perl/CGI app which has been around for 7 to 10 years. With thousands of files, my first thought was something like this hack:
#!/usr/bin/perl use strict; use warnings; my @files = map { s{^\./}{}; $_ } `find . -type f | grep -v CVS`; chomp(@files); my $count = @files; my $curr = 1; foreach my $file (@files) { print "Processing $file. $curr out of $count.\n"; $curr++; my $no_ext = $file; $no_ext =~ s{^.*?([^./]+)(?:\.\w+)?$}{$1}; # Find all files | Exclude this file from list | find files which # reference it my $command = "find . -type f |grep -Ev '$file|text_r5_c2'|xargs grep -l '$no_ext'"; unless ( `$command` ) { warn $file,$/; } }
It's pretty ugly and *nix specific, but the basic idea is this:
It seemed reasonable, but ignoring the fact that it's very slow, the obvious problem kicked in: if you have an entire section of code no longer being used, it can be a self-referential section and therefore is unlikely to show up on this list. This app doesn't have a robust enough test suite to figure this out. Time for another strategy.
A coworker suggested grepping the access logs. Now I feel really stupid since this is so obvious. If a file shows up in there, we know we probably want to keep it. If it doesn't, it merits further investigation.
Re:Linux?
Ovid on 2007-09-10T14:48:51
Thought about that, but there are problems there. First, they completely fail if anyone's been just perusing the code in vim, yes? (I've been doing a lot of that learning the code base). Also, robots such as Googlebot might access old files which haven't been linked to in a while. Of course, I'm not much of an administrator, so if I'm wrong, let me know!