There is some bug in the process of transferring via Bluetooth multiple files from my phone to my PowerBook. As I transfer them individually, I end up transferring some twice when I forget to advance to the next image or video or message.
This would not be so bad if the phone did not use the same file name for any file again (like my digital camera does). Once I clear out the photos from my phone, the next photo get the name Image(01).jpg.
Now I have a problem. I have a folder full of photos. Some names are duplicates (with things like #1, #2 appended to them), and some files are duplicates (but with different names). To make this even worse, I renamed some of the good photos to have descriptive names.
I decided to fix that up today. I simply took the MD5 digest for each file, sorted the files with the same digest, and kept the one with the longest name (arbitrarily---I still have to look at them anyway).
#!/usr/bin/perl
use Digest::MD5 qw(md5_hex); use UNIVERSAL qw(isa);
my @files = glob( "*" );
foreach my $file ( @files ) { next if -d $file; my $data = do { local $/; open my($fh), $file; <$fh> }; my $digest = md5_hex($data);
if( exists $hash{$digest} and isa( $hash{$digest}, 'ARRAY' ) ) { push @{ $hash{$digest} }, $file; } else { $hash{$digest} = [ $file ]; } } foreach my $key ( sort keys %hash ) { my @files = sort { length $a <=> length $b } @{$hash{$key}}; next if @files == 1;
pop @files; #don't run this without thinking about the next line #unlink @files; } __END__