Finding Dupes

ajt on 2006-07-25T20:03:54

Last Friday was my local mini-LUG meeting. Someone asked to borrow a small Perl script I hacked up that finds duplicated files on a server.

At work my Windows admins are running out of hours in the night to back things up, adding more disk isn't a solution we just can't get it off to tape before the users are back on the systems. By forcing users to take ownership of their mess we were able to delete many megabytes of duplicated binary files scatterd all over the place. It's not fixed the chaos problems, but it's put off the end of the world for a short while.

It turns out that my friend has the same problem, disk space is cheap, but being able to back it up isn't...

Re:

Aristotle on 2006-07-25T23:56:30

Someone asked to borrow a small Perl script I hacked up that finds duplicated files on a server.

Has he given it back yet?

Re:

ajt on 2006-07-26T07:38:58

I must confess I've not lent it to him yet... ;-)

staging

gizmo_mathboy on 2006-07-26T11:17:08

Regarding the backup problem. Why not get some of those cheap disks and create a staging box where all of the "data to be backed up" is stored and then the tape can back up from that?

Re:staging

ajt on 2006-07-29T14:56:01

For our Unix systems we backup to disk using IBM's flashcopy, which is our prefered backup/restore method for the SAP systems. The Windows systems are a bit primitive, they don't do LVM or any fancy file systems, just old fashioned NTFS onto hardware mirrored disks. While you can do smart things with the disk, we don't...

I've now completed my re-write of my find-duplicate-files tool, which works okay on my Linux box at home. It's a bit faster than the last version I wrote, as this one only checksums files of the same size, the original one checksummed everything, which was a bit wasteful.