Is there a free way to quickly remove duplicate files from my computer? ~ Ask The Admin

Tuesday, July 15, 2008

Is there a free way to quickly remove duplicate files from my computer?

Some time duplicates are good things like in this image on your left. (feeling patriotic today?) But most of the time in the tech industry dupes are bad.

Today we got a question about how to kill duplicate files with different names quickly.

We immediately thought of one of our favorite old command line applications that was sitting on my shared application folder... Somewhere


We got a question from one of our readers:

Hi AtA,

Can you recommend me a program that will scan for duplicate files? I have a external HDD with multiple files with similar name, example:

filedocument.doc 2KB

filedocument-1.doc 2KB

filedocument-2.doc 2KB

Thanks,

Keep up the good work on AtA

We have been using this little application called Finddupe since Windows 98! Its free and works wonderfully! It's chock full of command line geeky goodness. Delete duplicates, compare directories, do a test run and much more. Recover your lost space from gigz of the same porn with different names Duplicate MP3's. Check out this info on the application and a download link below:

FindDupe

Finddupe is a tool for quick detecting of duplicate files on a hard drive under Windows. Duplicate files can be just detected, hardlinked, or deleted.

Deleting duplicate files
When working thru somebody else's photo collection, or MP3 collection, this tool is useful for deleting the files that are duplicate. Depending on how the media is organized, there can be a lot of duplicate files in a collection.

Freeing hard drive space
Sometimes its intentional to have certain media in multiple places. By running finddupe, and hard linking the identical files, you can keep the files in multiple places, while only having one physical copy on the hard drive.

Detecting changed files for backup
Finddupe is useful for detecting which files have changed and need backing up. Simply back up the media, and then run finddupe to eliminate those files in the copy that are already contained in a previous backup.

Finddupe is a command line program. If you don't know what a command prompt under Windows is, you may have to do a bit of learning before attempting to use this program. The command prompt is not DOS (before-windows), although it looks and acts a lot like that, and people unfamiliar thing that it is dos. Please don't ask me for help if you don't know how to use command line based programs - learn about that first.

finddupe command line options
finddupe [options] [-ref] <filepat> [filepat]...

    -hardlink
    Delete duplicate copies of file, and replace duplicates with hardlinks to other copy of the file. Works only on NTFS file systems, and with administrator privileges. (The C: drive under XP is almost always NTFS, and most people log in as administrator)

    -del
    Delete duplicate files

    -sigs
    Pring computed file signature of each file. The file signature is computed using a CRC of the first 32k of the file, as well as its length. The signature is used to detect files that are probably duplicates. Finddupe does a full binary file compare before taking any action.

    -rdonly
    Also operate on files that have the readonly bit set. I use this feature to eliminate shared files in large projects under version control at work.

    -bat
    Do not hardlink or delete any files. Rather, create a batch file containing the actions to be performed. This can be useful if you want to inspect what finddupe will do.

    -ref <filepat>
    The file or file pattern after the -ref is a reference. These files will be compared against, but not eliminated. Rather, other files on the command line will be considered duplicates of the reference files.

    filepat
    File pattern matching in finddupe is very powerful. It uses the same code as is used in jhead. For example, to specify c:\** would indicate every file on the entire C drive. Specifying C:\**\foo\*.jpg specifies any file that ends with .jpg that is in a subdirectory called foo anywhere on the hard drive, including such directories as c:\foo, c:\bar\foo, c:\hello\workd\foo and c:\foo\bar\foo.

Example uses
If you have a previous backup in a directory tree on c:\prev_backup, and just copied your work files to a directory tree on c:\new_backup, you can remove any files that are in the previous backup with the following incarnation:

finddupe -del -ref c:\prev_backup c:\new_backup

If you have a large photo collection on c:\photos, and you wish to replace duplicates with hard links, you can run:


    finddupe -hardlink c:\photos

Note that this only works on NTFS file systems (such as the C drive under Windows XP). It won't work on FAT file systems, like the ones used on most external hard disks or USB flash drives.

If you just want to know which files are common between two directory trees, you can run:


    finddupe -bat work.bat -del c:\media\** c:\media2\**

This will create the file "work bat" with file delete commands in it. The '-bat' option tells finddupe to not do anything, but rather store the actions to a batch file. This allows you to review what finddupe would do before taking any action. The '**' tells it to recursively do all the files.

So it does what it says and it gets rid of your dupes! What do you guys use for this? Anything or do you suffer?

_TheDuplicateAdmiN_