============================================================================== Find Dupe v0.10á ============================================================================== Ever want to find duplicate files on your system? Get tired of setting up long, complicated configurations just to do that? What about those duplicates you *know* about, and don't want to be bothered with? Here is a simple program that won't solve everything, but does fix the above. FindDupe scans your harddrive(s) and lists *every* file you have, in alphabetical order based on filename rather than path. And it will ignore files that you tell it, even based on wildcards! If you don't want to see files ending in .bak, just tell the program to skip '*.bak'! Program requirements: ===================== 386 or better OS/2 2.x or better (only tested on Warp Connect, FP 17) (32-bit DOS version requires DOS 5.0 or better, by request only) 8MB RAM recommended (more if you have more files to scan, otherwise the swapping could become prohibitive. This can be a memory-hungry program!) HPFS recommended (extensive disk scanning is faster under HPFS than FAT). Program use: ============ OS/2: finddupe [-c ConfigurationFile] [-o OutputFile] [-d ] DOS: fdupe [-c ConfigurationFile] [-o OutputFile] [-d ] where: -c ConfigurationFile tells the program where to find its config file. This defaults to FINDDUPE.CFG in the current directory. -o OutputFile tells the program where to place its output. This may also be placed in the configuration file. The one on the commandline takes precedence, however. If there is no output file specified, the output is to stdout (i.e., redirect via >) -d tells the program whether the Dupes-Only mode should be used. If it is on, FindDupe will not print files that are not duplicates. Overrides the keyword in the configuration file. -q tells the program to shut the heck up. Only critical problems will be displayed, otherwise this is "quiet" mode. Configuration File: =================== This is a plain ASCII text file that has the following commands. skip Tells FindDupe to skip any file (or subdirectory) with this name. For example, skip *.bak will skip all files ending in .bak This keyword may be used more than once. setrecurse Tells FindDupe that from this point in the configuration file, all directories should be recursed (on) or not recursed (off) into their subdirectories unless the subdirectory name is supposed to be skipped. Note: multiple of these may be used and only apply to directories named after the command. Default for any directories before a setrecurse command is 'off'. This keyword may be used more than once. recursive Synonym for setrecurse on. This keyword may be used more than once. nonrecursive Synonym for setrecurse off. This keyword may be used more than once. directory [r|n] Tells FindDupe which directories to look for. Full pathname is required. An optional 'r' or 'n' will override the default set by setrecurse: 'r' will turn on recurse for this directory, 'n' will turn it off. Note that specifying a directory twice, either directly or due to recursion, will result in all files in that directory being counted twice. Perhaps later I will sort that out. This keyword may be used more than once. usemaximus Read in the Maximus 3.0x FAREA.DAT file and use all download paths. Any areas read from the FAREA.DAT will *not* be recursed, regardless of the current default. In order to use this, your MAXIMUS environment variable *must* be set properly in the session running FindDupe. outfile Set the output file. Overridden by the commandline. dupesonly Do not print out files that are not dupes. Useful for only finding dupes without the lengthy listing of all files. ; # This may be put anywhere on a line. Anything after a semicolon or pound character will be ignored as a comment. Deciphering the output: ======================= The output looks like: nn. (dd) YYYY/MM/DD size full\path\filename.ext where nn is the number of unique filenames to this point (all duplicates will have the same 'nn') and dd is the number of the dupe that this is. For example: ... 44. (1) 1992/03/29 124K c:\pacman\pacman.exe 45. (1) 1993/02/14 42K c:\zip\pkzip.exe 45. (2) 1995/03/11 42K c:\backup\pkzip.exe 46. (1) 1996/02/10 10K c:\bbs\file\fernwood\graphx\pointers.zip 47. (1) 1996/02/02 72K c:\bbs\file\fernwood\apps\polycalc.zip ... As we notice, we have two copies of pkzip on the system in the directories specified. They are both given unique file number 45, however one is found in the c:\zip directory and the other in the c:\backup directory. The reason for giving them different dupe numbers is that this way you can do a text search for (2) to find all exact duplicate names. If all dupes were given the same dupe number, we could have 3 files with the same name, meaning you would have to search for (2), (3), (4), etc. The last file shown will be the number of unique filenames. The final total of filenames will be placed at the bottom: ... 447. (1) 1996/04/28 590K c:\bbs\file\fernwood\apps\zoc214.zip Total: 487 files In this case, we notice that there are 40 duplicates (487 - 447). Registering FindDupe: ===================== HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA! (It's freeware) A registered version is planned, PM-based. When that comes out, this text file will be updated to explain how to purchase it. Getting Support: ================ I can be reached at: mcbride@ee.ualberta.ca Or, via netmail to: Fidonet: 1:342/708 IBMNet: 40:6494/2004 FireNet: 77:1403/1 However, this program is beta software, and likely will always remain such. If it ever becomes too big, it may be dropped. ;-) Future possibilities: ===================== In my ever expanding need to learn the STL, I will try to add: 1. Smart overlap checking (i.e., detecting that we've done this directory before) 2. BinkleyTerm support (via the F'REQ areas) 3. LoraBBS support 4. Size variance for dupe matching. => J Hulley-Miller 5. Ignoring matches for certain extentions, i.e., ZIP & TXT extentions would not be considered dupes. Perhaps this will go the other way: defining what are matches, to allow you to specify all the archivers as being equal, all the picture formats as equal, all multi-track audio clips as equal, etc. => J Hulley-Miller History: ======== Legend: * Rewrite of part or all + Added - Removed ! Bugfix . Notes 0.10 ------------------------------------------------------------------- ! Couldn't handle the root directory properly. Fixed. Reported originally by Gregorio Kus (Grego@RMnet.IT) ! If no files found, crashed. Fixed. Inadvertedly found by Gregorio Kus (Grego@RMnet.IT) + Quiet mode (-q) added. 0.09 ------------------------------------------------------------------- + Added version comparisons. ZOC210.ZIP compares as a dupe to Zoc213.LHA and zOc2145.Zip. The side effect is that it also compares as a dupe to Zoc.extra.extentions. + Compiled to 32-bit DOS (DOS4GW) for the first time. May be discontinued in the future. NOT A SUPPORTED PLATFORM. 0.08 ------------------------------------------------------------------- * Re-ordered some source to make it easier for me to follow. Much cleaned up ! Minor changes to output format (i.e., 6.0M should be 6M) 0.07 ------------------------------------------------------------------- ! Recursing subdirectories didn't work. Dumdumdum. Fixed. 0.06 ------------------------------------------------------------------- + Added DupesOnly keyword ! Clarified configuration file on new features since 0.04 (Yes, I consider documentation errors or usability as 'bugs') + Added time display at conclusion of listing to show how long scanning the drive took. * Fixed history listings to be newest first (duh...) 0.05 ------------------------------------------------------------------- + Added date/size stamping 0.04 ------------------------------------------------------------------- * Got a book on STL. :-) Rewrote *everything* using STL instead. + Added Maximus support - Removed limitation on number of files to read at a time (thanks, STL!) . Done in 24 hours. + Created readme/history 0.03 ------------------------------------------------------------------- * Got Watcom. :-} . Got the STL and started using it for part of this. This messed me up completely. . Never finished. 0.02 ------------------------------------------------------------------- + Hacked in config-file support . Only a couple people looked at it 0.01 ------------------------------------------------------------------- * Complete hack. Trust me. Used EMX 0.9a with EMXRT.DLL