============================================================================== Find Dupe v0.09á ============================================================================== Ever want to find duplicate files on your system? Get tired of setting up long, complicated configurations just to do that? What about those duplicates you *know* about, and don't want to be bothered with? Here is a simple program that won't solve everything, but does fix the above. FindDupe scans your harddrive(s) and lists *every* file you have, in alphabetical order based on filename rather than path. And it will ignore files that you tell it, even based on wildcards! If you don't want to see files ending in .bak, just tell the program to skip *.bak! Program requirements: ===================== 386 or better OS/2 2.x or better (only tested on Warp Connect, FP 17) 8MB RAM recommended (more if you have more files to scan, otherwise the swapping could become prohibitive. This can be a memory-hungry program!) HPFS recommended (extensive disk scanning is faster under HPFS than FAT). Program use: ============ finddupe [-c ConfigurationFile] [-o OutputFile] [-d ] where: -c ConfigurationFile tells the program where to find its config file. This defaults to FINDDUPE.CFG in the current directory. -o OutputFile tells the program where to place its output. This may also be placed in the configuration file. The one on the commandline takes precedence, however. If there is no output file specified, the output is to stdout (i.e., redirect via >) -d tells the program whether the Dupes-Only mode should be used. If it is on, FindDupe will not print files that are not duplicates. Overrides the keyword in the configuration file. Note: User notification (current directory, action, etc.) is done to stderr. To hide all output, ensure you have an output file and redirect stderr as well (using '>2 nul') Configuration File: =================== This is a plain ASCII text file that has the following commands. skip Tells FindDupe to skip any file (or subdirectory) with this name. For example, skip *.bak will skip all files ending in .bak setrecurse Tells FindDupe that from this point in the configuration file, all directories should be recursed (on) or not recursed (off) into their subdirectories unless the subdirectory name is supposed to be skipped. Note: multiple of these may be used and only apply to directories named after the command. Default for any directories before a setrecurse command is 'off'. recursive Synonym for setrecurse on. nonrecursive Synonym for setrecurse off. directory [r|n] Tells FindDupe which directories to look for. Full pathname is required. An optional 'r' or 'n' will override the default set by setrecurse: 'r' will turn on recurse for this directory, 'n' will turn it off. Note that specifying a directory twice, either directly or due to recursion, will result in all files in that directory being counted twice. Perhaps later I will sort that out. usemaximus Read in the Maximus 3.0x FAREA.DAT file and use all download paths. Any areas read from the FAREA.DAT will *not* be recursed, regardless of the current default. In order to use this, your MAXIMUS environment variable *must* be set properly in the session running FindDupe. outfile Set the output file. Overridden by the commandline. dupesonly Do not print out files that are not dupes. Useful for only finding dupes without the lengthy listing of all files. ; # This may be put anywhere on a line. Anything after a semicolon or pound character will be ignored. Deciphering the output: ======================= The output looks like: nn. (dd) YYYY/MM/DD size full\path\filename.ext where nn is the number of unique filenames to this point (all duplicates will have the same 'nn') and dd is the number of the dupe that this is. For example: ... 44. (1) 1992/03/29 124K c:\pacman\pacman.exe 45. (1) 1993/02/14 42K c:\zip\pkzip.exe 45. (2) 1995/03/11 42K c:\backup\pkzip.exe 46. (1) 1996/02/10 10K c:\bbs\file\fernwood\graphx\pointers.zip 47. (1) 1996/02/02 72K c:\bbs\file\fernwood\apps\polycalc.zip ... As we notice, we have two copies of pkzip on the system in the directories specified. They are both given unique file number 45, however one is found in the c:\zip directory and the other in the c:\backup directory. The reason for giving them different dupe numbers is that this way you can do a text search for (2) to find all exact duplicate names. If all dupes were given the same dupe number, we could have 3 files with the same name, meaning you would have to search for (2), (3), (4), etc. The last file shown will be the number of unique filenames. The final total of filenames will be placed at the bottom: ... 447. (1) 1996/04/28 590K c:\bbs\file\fernwood\apps\zoc214.zip Total: 487 files In this case, we notice that there are 40 duplicates (487 - 447). Registering FindDupe: ===================== HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA! (It's freeware) Getting Support: ================ I can be reached at: mcbride@ee.ualberta.ca Or, via netmail to: Fidonet: 1:342/708 IBMNet: 40:6494/2004 FireNet: 77:1403/1 Future possibilities: ===================== In my ever expanding need to learn the STL, I will try to add: 1. dupe matching against version numbers in the filename so that 'zoc210.zip' will match 'zoc213.zip'. 2. Smart overlap checking (i.e., detecting that we've done this directory before) 3. BinkleyTerm support (via the F'REQ areas) 4. LoraBBS support Source Code: ============ Perhaps, if you're really nice, I'll let you at it. It's gotten ugly again... This was compiled using the support of: Watcom 10.6 Hewlett-Packard's Standard Template Library History: ======== Legend: * Rewrite of part or all + Added - Removed ! Bugfix . Notes 0.09 + Added version comparisons. ZOC210.ZIP compares as a dupe to Zoc213.LHA and zOc2145.Zip. The side effect is that it also compares as a dupe to Zoc.extra.extentions. 0.08 * Re-ordered some source to make it easier for me to follow. Much cleaned up ! Minor changes to output format (i.e., 6.0M should be 6M) 0.07 ! Recursing subdirectories didn't work. Dumdumdum. Fixed. 0.06 + Added DupesOnly keyword ! Clarified configuration file on new features since 0.04 (Yes, I consider documentation errors or usability as 'bugs') + Added time display at conclusion of listing to show how long scanning the drive took. * Fixed history listings to be newest first (duh...) 0.05 + Added date/size stamping 0.04 * Got a book on STL. :-) Rewrote *everything* using STL instead. + Added Maximus support - Removed limitation on number of files to read at a time (thanks, STL!) . Done in 24 hours. + Created readme/history 0.03 * Got Watcom. :-} . Got the STL and started using it for part of this. This messed me up completely. . Never finished. 0.02 + Hacked in config-file support . Only a couple people looked at it 0.01 * Complete hack. Trust me. Used EMX 0.9a with EMXRT.DLL