ZDCS - Zipfile Duplicate Checking System - Version 1.0 Copyright (C) 1991, Michael W. Cocke General Information ------------------- ZDCS is a set of utilities designed to test a .ZIP file (and ONLY a *.ZIP file!) to see if any of files within the ZIP are already present in a specified collection of files. The particular application I had in mind is trying to weed out duplicate files from a bbs file collection, and to test new uploads against the bbs files for duplicates, so that's how the package was assembled. The files are checked by keeping a database of the 32-bit CRC used internally by PKZIP and then comparing the CRC's for the files in the new upload against those stored in the database. The database of CRC's uses a B-tree index, so there are no sort utilities or regular file maintenance requirements of any kind. After having looked at two other utilities that do the same thing as ZDCS, I decided to "roll my own". One of the two wasn't LAN friendly, and the other wasn't SysOp friendly - being a SysOp with a LAN, those alternatives didn't work particularly well for me... This is my attempt to split the difference. ZDCS is designed specifically for PCBoard 14.5 and higher. Although it may be possible to use it with other bbs software, I will attempt only to describe its use with PCBoard. Throughout the design of ZDCS, I have been ruthless in deciding *not* to provide certain extra features. My philosophy has been one of "There's a perfectly good utility written by XXXXX that does this - so I won't duplicate features and increase code size and execution time by reinventing the wheel." The example that I had in mind is a nifty utility called EXZTEST. ZDCS was designed specifically to "plug into" PCBTEST.BAT, right behind EXZTEST. EXZTEST does a perfectly credible job of swapping ZIP comments, using external test programs to identify viruses, detecting damaged ZIP's, and so on. I have also attempted to minimize the time required to process ZIP's. Running across a 10 mbs LAN, I was able to process 7,588 ZIP files containing a total of 65,122 individual files in about two and one-half hours. Comparable speed was achieved on a local disk drive as well. Please note that performance, like mileage, *will* vary. ZDCS - Zipfile Duplicate Checking System - Version 1.0 Copyright (C) 1991, Michael W. Cocke Capabilities ------------ ZDCS allows the sysop to select among three different options for the processing of uploaded ZIPs. The first option is setting the maximum percentage of duplicates in an uploaded ZIP. ZDCS will calculate the actual percentage of duplicate files in the ZIP and will compare it to the maximum number selected by the sysop. If the actual percentage is greater, the upload will be declined. (The PCBoard code takes care of deleting these declined files from the BBS.) The percentage is configurable by the sysop to any whole number from 0 to 100. Setting the percentage to 100 effectively bypasses this filter, since it permits an uploaded ZIP with nothing but duplicates to pass. At the other extreme, setting the percentage to 0 effectively requires that the uploaded ZIP have no duplicates at all. The second independent option that can be selected by the sysop is the removal of *all* individual duplicate files from the uploaded ZIP. When this option is selected, all files within the ZIP that have CRC's already listed in the database are removed from the ZIP, leaving the remainder of the uploaded ZIP intact. The third option is the removal of *designated* duplicate files from the uploaded ZIP. A separate database is created with the CRC's of all files that are to be removed. This is a great way to clear out all those blasted bbs ads from you-know-who, without removing authors' unchanged text files from newer shareware versions. In fact, the database of files that you want to always remove from uploads is referred to as the bbs ads database. Once an uploaded ZIP file has been accepted, the CRC's of the individual files in the ZIP are automatically added to the file database so that the system will recognize any future uploads of the same files as duplicates. No CRC's are automatically added to the bbs ads database. That must be done with one of the utilities in the ZDCS package. ZDCS - Zipfile Duplicate Checking System - Version 1.0 Copyright (C) 1991, Michael W. Cocke License Information ------------------- I'm afraid that I am going to have to break one of my own rules with this release. I have long been of the opinion that sysops are going broke fast enough without my help, and have consequently released all of my sysop utilities as freeware. With the current condition of the economy and the recent loss of my own job, I am now the sysop who's going broke. ZDCS is being released as fully functional shareware. There are no critical limits, crippled features or "drop dead" dates. If you try ZDCS out on your system and decide that you want to continue using it, please register your copy by sending a check for $25.00 (US). Registration entitles you to a license for use of the Zip Duplicate Checking System on one bbs, including all future versions of ZDCS. No additional fees will be charged for registration of future versions of this product. Please make your registration check payable to Michael W. Cocke and mail to: Michael W. Cocke 11 Cedar Road Montville NJ 07045-9582 Please be assured that I will continue to support ZDCS through future revisions. I have been using it on our own bbs, The Hacker Central, and wonder how I lasted so long without it. There are already a few ideas for improvements buzzing around in the back of my head. Product support will be handled two ways. Questions and discussion of ZDCS are welcome in the ILink SHAREWARE conference, available through most ILink member bbs's. Product support will also be done on our home bbs, The Hacker Central. Please call us at node 1: 201-334-2257 N81 1200-2400 baud. Callers are usually validated within 24 hours and all registered users of ZDCS will be given access to node 2: 201-318-8840 N81 1200-38,400 HST/v.32. A special conference will be available for registered users of ZDCS. (Suggestions and requests cheerfully welcomed.) ZDCS may be freely copied and distributed provided that it is distributed in its complete and original form with all individual files intact and unmodified. No fee of any kind may be charged for distribution or copying of the ZDCS package. ZDCS - Zipfile Duplicate Checking System - Version 1.0 Copyright (C) 1991, Michael W. Cocke Files Included in this Release ------------------------------ PCBTEST.BAT A sample PCBTEST.BAT file, suitable for use with EXZTEST and ZDCS. ZDCS.CFG A sample configuration file for ZDCS. ZDCSDBB1.EXE The utility program to create the initial database. ZDCSADB1.EXE The utility program used to add bbs ads to the separate bbs ad database. ZDCSDR1.EXE The report generating program that produces the list of duplicate files within the database. ZDCSFC1.EXE The program module to perform a real-time test for duplicates (as soon as the upload is received). ZDCS.TXT A rambling text file full of useful and other kinds of information. (You're currently reading it.) ZDCS.NEW A last minute text file with changes and notes. Installation - Setting Up the Configuration File ------------------------------------------------ First, you will need to create the ZDCS.CFG file for your configuration. All of the ZDCS programs need this file. The location of ZDCS.CFG must be one of two places, according to the version of DOS you are using. If you are running under DOS 3.x or higher, this file should be located in the same directory as the ZDCS executable files. If you are running under DOS 2.x or lower, the configuration file should be located in the directory that will be the current directory when you run any of the ZDCS programs. Please note that I was not personally able to test ZDCS under DOS 2.x or DOS 4.x. If you have a problem under these versions, please let me know. ZDCS - Zipfile Duplicate Checking System - Version 1.0 Copyright (C) 1991, Michael W. Cocke The format of the ZDCS.CFG file is: Line 1: The complete drive, path and filename of your DLPATH.LST file (part of PCBoard). Note: This is how the DBB module knows where to find the files that it has to process. Please make sure that the DLPATH.LST you specify contains all the download paths for all the files that you want to add to the ZDCS database. Line 2: The drive and pathname where the ZDCS database will be located. Note: A trailing backslash should not (absolutely not!) be supplied. Line 3: Either the letter "Y" or the letter "N". Note: This is where you set the switch to tell ZDCS whether to delete *all* duplicate files in the uploaded ZIP (Y) or leave them alone (N). Line 4: An integer (that's a whole number, no decimals) between 0 and 100. Note: This is the maximum percentage of files contained within the uploaded ZIP that may be duplicates of files already on your system. ZDCS will calculate the actual percentage of duplicates in the ZIP and compare it to your maximum percentage. If the actual percentage is lower, the uploaded ZIP is accepted. If the actual percentage is equal to or higher than the maximum you specified, the upload is declined. Line 5: The complete drive, path and filename for the log that will be created by the ZDCSFC1 program module. Line 6: Either the letter "Y" or the letter "N". Note: This is where you set the switch to tell ZDCS whether to delete *designated* duplicate files (bbs ads) in the uploaded ZIP (Y) or leave them alone (N). ZDCS - Zipfile Duplicate Checking System - Version 1.0 Copyright (C) 1991, Michael W. Cocke Installation - Creating the Initial Database -------------------------------------------- Now that you have created your ZDCS.CFG file, the next step is to create the initial database of CRC's. The database consists of two files: ZDCS.NDX (the index) and ZDCS.DAT (the data). To create the database, you simply run the ZDCSDBB1.EXE program. As long as you have created the ZDCS.CFG file properly, there is nothing more to be done until this program finishes processing. The display points out that you may press the F1 key at any time for a status summary. The ZDCSDBB1 status summary is a single line that contains the following information from left to right: "Share" or "NoShare" number of ZIP's processed so far start time number of files processed so far value of pdupe bit flag The first item, the word "Share" or the word "NoShare", indicates the type of "file opens" being used. The presence of the DOS share utility in memory is detected by all ZDCS programs to permit automatic use of the appropriate type of file access. The second item is just the number of ZIP's whose individual files have been added to the CRC database so far. The third item is the time of day when the ZDCSDBB1 program was started. This time is taken from the DOS clock. The fourth item is the number of individual files that have been added to the database. The fifth and last item is arcane internal status information, which I will not explain here. (Consider yourself lucky.) ZDCSDBB1 has finished creating the initial database, it will display a final status summary line, using the same format just described. Below the start time in that line will be the end time when processing was completed. WARNING: If you start creating the database with ZDCSDBB1 and then develop a sudden need to abort, use the F10 key!! The display will remind you that F10 is the key provided for aborting the process. If you abort the program by any other method, you will almost certainly create lost and/or cross-linked clusters on your hard disk. ZDCS - Zipfile Duplicate Checking System - Version 1.0 Copyright (C) 1991, Michael W. Cocke Installation - Creating Designated Duplicates (BBS Ads) Database ---------------------------------------------------------------- This step is needed only if you have decided to use the option to delete designated duplicate files (usually bbs ads) from the uploaded ZIP. To do this, you have to tell the system which files are the dreaded bbs ads by creating a bbs ads database. The database will be located in the same directory as the rest of the ZDCS files, and will consist of two parts: ZDCS-BBA.NDX (the index) and ZDCS-BBA.DAT (the data). First, collect all those nasty ads together and zip them up into one ZIP file. Use whatever name you like for the ZIP; I'm going to call it BBS-ADS.ZIP.) Then, run the utility ZDCSBBA1.EXE. The syntax (7% in NJ) is: ZDCSBBA1 BBS-ADS.ZIP (If you have used a different name for your ZIP collection of bbs ads, just use that name in place of BBS-ADS.ZIP.) The program will create the database files and you will be ready to delete all that free advertising. Just make sure that you have used a "Y" as the last line of the configuration file to turn this option on. Installation - Setting Up the Check for Uploaded Duplicates ----------------------------------------------------------- Almost done! Now that you have created the ZDCS database(s), all you need to do is get the bbs to check all uploaded ZIP's for duplicate files from now on. This will be done by processing the ZIP's with ZDCSFC1.EXE when they are received. ZDCSFC1.EXE is the real-time upload tester that compares the CRC's of the files in the uploaded ZIP against the database(s). It also updates the duplicate files database (but not the bbs ads database) with the CRC's of the new files in the ZIP. To process the new uploads with this program, ZDCSFC1 must be called by the PCBTEST.BAT file (part of PCBoard). ZDCS - Zipfile Duplicate Checking System - Version 1.0 Copyright (C) 1991, Michael W. Cocke A sample PCBTEST.BAT file is included in this package. (I know this configuration works, because I'm using it myself.) Here is that file, with lots of comments. @If Exist PCBFAIL.TXT Del PCBFAIL.TXT @If Exist PCBPASS.TXT Del PCBPASS.TXT @If Exist ZDCS-DEL.LST Del ZDCS-DEL.LST The file ZDCS-DEL.LST is created by ZDCSFC1.EXE if you are using either of the two options to delete duplicate files. [Line 3 in the configuration file is for deleting *all* duplicate files; line 6 is for deleting *designated* duplicates (bbs ads).] It is a `control file' that is passed to PKZIP below. EXZTEST %1 0 %2 /@Q:\PCB\EXZIP.EXC /!r:\wrk\comment /s /w This command calls EXZTEST and let it do its thing, or things. ZDCSFC1 %1 %2 These are the two standard parameters passed by PCBoard. Please note that if the second parameter is *not* the word "UPLOAD", ZDCSFC1 does absolutely nothing but exit. If Not Exist ZDCS-DEL.LST Goto End If neither line 3 (delete all duplicates) nor line 6 (delete designated duplicates) of the configuration file is set to "Y", then you aren't deleting any files from inside the uploaded ZIP. In that case this file will not exist, so you obviously don't want to pass it to PKZIP. This bypasses the deletion. Pkzip -d %1 @ZDCS-DEL.LST This is where any duplicate files within the uploaded ZIP are removed from the uploaded file. :End That's all, folks! ZDCS - Zipfile Duplicate Checking System - Version 1.0 Copyright (C) 1991, Michael W. Cocke Misc. Notes and Comments ------------------------ When you first create the database against which you want to check future uploads, you may discover that you already have some duplicate files in your collection of ZIP's on the bbs. ZDCSDR1.EXE is a small utility that generates a list of those duplicate files within the ZIP's. The list is sorted in ascending order by CRC and contains the CRC, the name of the duplicate file, and identity of the ZIP (with full drive, path and filename) that contains the duplicate file. Output is to an ASCII text file named ZDCS-DUP.LST, which will be located in the directory that was current when the report generator (ZDCSDR1.EXE) was run. No deletion of duplicate ZIPfile members is done by ZDCSDBB1 when you create your ZDCS database. In addition to creating the required PCBPASS.TXT and PCBFAIL.TXT files, ZDCSFC1 also sets the DOS ERRORLEVEL when it exits. These levels are: 0 No duplicate files were found within the upload, or no processing was done. 2 Some duplicates were found. 3 Every file within the upload was a duplicate. Limits and Future Enhancements ------------------------------ There is an internal limit of 800 files within an individual ZIP that can be processed. If you find a ZIP file with more than 800 files, ZDCS will process the first 800 files and then move to the next ZIP for processing. If you do encounter a ZIP with more than 800 files (including files in ZIP's inside the ZIP), please let me know. This limit can be raised if necessary. I set this limit in the interests of keeping required RAM to a minimum. As released in version 1.00, ZDCSFC1 requires something on the order of 90K, plus whatever overhead PKUNZIP requires for processing ZIP's within ZIP's. ZDCS - Zipfile Duplicate Checking System - Version 1.0 Copyright (C) 1991, Michael W. Cocke Processing of ZIP files contained within ZIP files is accomplished with some caveats. ZIP's within a ZIP are only checked one level deep. The simplest explanation I can think of is an example. ZIP A contains assorted files and ZIP B. In turn, ZIP B contains more files and another ZIP, C. ZIP C contains still more files. How does the whole melange get processed? All the files in ZIP A and in ZIP B have their CRC's entered into the duplicate files database. If you have selected to delete either *all* duplicates or *designated* (bbs ads) duplicates, then those deletions are done automatically only for the files in ZIP A. Of course, all the duplicates in ZIP B are still listed in the log, so you do know about them and you can decide whether to remove them manually. What about ZIP C? That's easy: any ZIP imbedded more than one level deep in the uploaded ZIP (and C is two levels deep) is not processed as a ZIP at all. No file deletions, no CRC's, nothing. In fact, the only thing that happens to it is that its CRC is added to the duplicate files database. There is no way to delete ZDCS database records. Frankly, I see no need for this facility (yet!). If you do discover a need to do this, register your copy of ZDCS and let me know what you would like. Copyright Notices and Other Legal Stuff --------------------------------------- Michael W. Cocke and MWC Enterprises will not accept responsibility for the function, failure to function, or side effects of any function of the Zip Duplicate Checking System (ZDCS). ZDCS is provided in good faith, but its use is *solely* at the risk of the operator. PKZIP and PKUNZIP are copyright and trademark of PKWARE, Inc. PCBoard is copyright and trademark of Clark Development Company. EXZTEST is copyright Andy Keeves. PROBAS library and the PROBAS Toolkit are copyright and trademark of Hammerly Computer Systems, Inc. MS-DOS is copyright and trademark of Microsoft Corp. ZDCS is copyright (C) 1991, Michael W. Cocke.