Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 1 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- 1. INTRODUCTION - What is ZDCS? ZDCS is a shareware set of utilities intended to help a sysop deal with the problem of duplicate files in ZIPs already on the bbs or being uploaded by the callers. It is compatible with any Netbios compatible lan, such as Lantastic or Novell. ZDCS is very simple to set up and run, but still provides the sysop with the flexibility of deciding when to decline an upload (if ever) and which duplicate files to automatically remove from an upload (if any). This short manual will describe how a sysop can use ZDCS, including the sysop-configurable options, and will take you through the steps needed to install and configure ZDCS. License and registration information are provided in section 12. 2. FILES IN THIS RELEASE - What should this package contain? File Name Description ------------ ---------------------------------------------------- PCBTEST.BAT A sample PCBTEST.BAT file, suitable for use with EXZTEST and ZDCS. ZDCS.CFG A sample configuration file for ZDCS. ZDCSDB15.EXE The utility program to create the initial database. ZDCSAB15.EXE The utility program used to add bbs ads to the separate bbs ad database. ZDCSDR15.EXE The report generating program that produces the list of duplicate files within the database. ZDCSFC15.EXE The program module to perform a real-time test for duplicates (as soon as the upload is received). ZDCS.TXT A rambling text file full of useful and other kinds of information. (You're currently reading it.) ZDCS.NEW A last minute text file with changes and notes. ZDCS.ADN A sample listing of allowed duplicate files. 3. GENERAL INFORMATION - What's going on? ZDCS is intended to be used on a specified collection of ZIP files, and only with ZIP files. The particular application I had in mind was trying to weed out duplicate files from a bbs file collection and to test new uploads against the bbs files for duplicates, so that's how the package was assembled. ZDCS is designed specifically for PCBoard 14.5 and higher. (Although it may be possible to use it with other bbs software, this manual will attempt only to describe its use with PCBoard.) First, ZDCS has to check out the initial collection of ZIP files. This provides a database against which any new upload is checked. Of course, every new upload accepted by the board is then added to the database, ready for testing the next upload. This automatically keeps the database current. Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 2 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- The files are checked by keeping a database of the 32-bit CRC used internally by PKZIP and then comparing the CRCs for the files in the new upload against those stored in the database. The database of CRCs uses a B-tree index, so there are no sort utilities or regular file maintenance requirements of any kind. When a caller uploads a ZIP file, ZDCS goes to work on it to detect any duplicate files. This is easy to set up since ZDCS was designed specifically to "plug into" PCBTEST.BAT, right behind EXZTEST. (EXZTEST is a wonderful sysop utility written by Andy Keeves and it does a perfectly credible job of swapping ZIP comments, using external test programs to identify viruses, detecting damaged ZIPs, and so on.) What happens to duplicate files depends on how the sysop has set the ZDCS options as described in the next section. 4. CAPABILITIES - What will ZDCS do? ZDCS allows the sysop to mix and match among four different options for the processing of uploaded ZIPs. The first option is setting the maximum percentage of duplicates in an uploaded ZIP. ZDCS will calculate the actual percentage of duplicate files in the ZIP and will compare it to the maximum number selected by the sysop. If the actual percentage is greater, the upload will be declined. (The PCBoard code takes care of moving these declined files into your private upload directory, where you can review them.) The percentage is configurable by the sysop to any whole number from 0 to 100. Setting the percentage to 100 effectively bypasses this filter, since it permits an uploaded ZIP with nothing but duplicates to pass. At the other extreme, setting the percentage to 0 effectively requires that the uploaded ZIP have no duplicates at all. The second independent option that can be selected by the sysop is the removal of *all* individual duplicate files from the uploaded ZIP. When this option is selected, all files within the ZIP that have CRCs already listed in the database are removed from the ZIP, leaving the remainder of the uploaded ZIP intact. A note of caution about this second option, the removal of *all* duplicate files: some shareware authors issue updates that consist of both new files (executables, perhaps) and unchanged files (registration information, for example). If you enable the deletion of all duplicate files in an uploaded ZIP, you will lose some of the files that belong in the author's package. The third option is the removal of *designated* duplicate files from the uploaded ZIP. A separate database is created with the CRCs of all files that are to be removed. This is a great way to clear out all those blasted bbs ads from you-know-who, without removing authors' unchanged text files from newer shareware versions. In fact, the database of files that you want to always remove from uploads is referred to as the bbs ads database. Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 3 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- The fourth option is the use of a list of allowed duplicate files. You can create a list of file names and/or CRCs that you do not want ZDCS to see as duplicates. Filenames are preceded by $ and CRCs by #, with one file per line for a maximum of 256 lines. (A sample list is included in this package.) Why would you want to have allowed duplicates? There are some files that reappear frequently as part of shareware or freeware packages, such as OMBUDSMN.ASP (found in ASP-ware), or VALIDATE.DOC and VALIDATE.COM (from Macafee's SCAN program). Especially in a case like Macafee's where new versions of the program come out frequently, each with certain standard files included, it would be useful to "recognize" these duplicate files as being acceptable. Once an uploaded ZIP file has been either accepted or declined, the CRCs of the individual files in the ZIP are automatically added to the file database so that the system will recognize any future uploads of the same files as duplicates. No CRCs are automatically added to the bbs ads database. That must be done with one of the utilities in the ZDCS package. 5. INSTALLATION - How do I get ZDCS up and running? There are four basic steps to installing ZDCS to work with the bbs: A. Setting up the configuration file. B. Creating the initial database. C. Creating the bbs ads database. D. Setting up the check for uploaded duplicates. Each of these steps is explained in excruciating detail in the next four sections. 6. STEP 1 - How do I set up the configuration file? The first step in the installation is to create the ZDCS.CFG file for your configuration. All of the ZDCS programs need this file. The location of ZDCS.CFG must be one of two places, according to the version of DOS you are using. If you are running under DOS 3.x or higher, this file should be located in the same directory as the ZDCS executable files. If you are running under DOS 2.x or lower, the configuration file should be located in the directory that will be the current directory when you run any of the ZDCS programs. Please note that I was not personally able to test ZDCS under DOS 2.x. If you have a problem under these versions, please let me know. The ZDCS.CFG file consists of six short lines. (A sample is included in this package.) It takes far longer to read the description that follows than it does to write the configuration file from scratch. Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 4 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- Line 1: The complete drive, path and filename of an ASCII text file that contains all the pathnames, one on each line, that contain the ZIPs to be included in the database. Note: This is how the DB module knows where to find the files that it has to process. If you are not using the index file feature in PCBoard 14.5A, then this is simply the DLPATH.LST file from PCBoard. Otherwise, you will have to create this file with a text editor. (Remember to add the trailing backslashes for each pathname.) There is an upper limit of 999 pathnames. Line 2: The drive and pathname where the ZDCS database will be located. Note: A trailing backslash should not (absolutely not!) be supplied. Believe me. Line 3: Either the letter "Y" or the letter "N". Note: This is where you set the switch to tell ZDCS whether to delete *all* duplicate files in the uploaded ZIP (Y) or leave them alone (N). Line 4: An integer (that's a whole number, no decimals) between 0 and 100. Note: This is the maximum percentage of files contained within the uploaded ZIP that may be duplicates of files already on your system. ZDCS will calculate the actual percentage of duplicates in the ZIP and compare it to your maximum percentage. If the actual percentage is lower, the uploaded ZIP is accepted. If the actual percentage is equal to or higher than the maximum you specified, the upload is declined. Line 5: The complete drive, path and filename for the log that will be created by the ZDCSFC15 program module. Line 6: Either the letter "Y" or the letter "N". Note: This is where you set the switch to tell ZDCS whether to delete *designated* duplicate files (bbs ads) in the uploaded ZIP (Y) or leave them alone (N). Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 5 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- 7. STEP 2 - How do I create the initial database? The next step after creating the ZDCS.CFG file is to create the initial database of CRCs. The database consists of three files: ZDCS.NDX (the index), ZDCS.DAT (part 1 of the data) and ZDCS.PTH (part 2 of the data). To create the database, you simply run the ZDCSDB15.EXE program. As long as you have created the ZDCS.CFG file properly, there is nothing more to be done until this program finishes processing. The display points out that you may press the F1 key at any time for a status summary. The ZDCSDB15 status summary is a single line that contains the following information from left to right: "Share" or "NoShare" number of ZIPs processed so far start time number of files processed so far value of pdupe bit flag The first item, the word "Share" or the word "NoShare", indicates the type of "file opens" being used. The presence of the DOS share utility in memory is detected by all ZDCS programs to permit automatic use of the appropriate type of file access. The second item is just the number of ZIPs whose individual files have been added to the CRC database so far. The third item is the time of day when the ZDCSDB15 program was started. This time is taken from the DOS clock. The fourth item is the number of individual files that have been added to the database. The fifth and last item is arcane internal status information, which will not be explained here. (Consider yourself lucky.) After ZDCSDB15 has finished creating the initial database, it will display a final status summary line, using the same format just described. Below the start time in that line will be the end time when processing was completed. WARNING: If you start creating the database with ZDCSDB15 and then develop a sudden need to abort, use the F10 key! The display will remind you that F10 is the key provided for aborting the process. If you abort the program by any other method, you will almost certainly create lost and/or cross-linked clusters on your hard disk. Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 6 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- It is entirely likely that when you first create the initial database you will already have some duplicate files in your collection of ZIPs on the bbs. There is a small utility in this packaged called ZDCSDR15.EXE that will generate a list of all the duplicate files, sorted in ascending order by CRC. The list contains the CRC and the name of the duplicate file and the identity of the ZIP (with full drive and pathname) that contains the duplicate file. Output is to an ASCII text file named ZDCS-DUP.LST, which will be located in the directory that was current when the report generator (ZDCSDR15.EXE) was run. Note that no duplicate files are deleted by ZDCSDB15.EXE when you create the initial database. The list of duplicates ZDCS-DUP.LST can be used by the sysop to remove any duplicate files. 8. STEP 3 - How do I create the bbs ads database? This step is needed only if you have decided to use the option to delete designated duplicate files (usually bbs ads) from the uploaded ZIP. To do this, you have to tell the system which files are the dreaded bbs ads by creating a bbs ads database. The database will be located in the same directory as the rest of the ZDCS files, and will consist of one file: ZDCS-BBA.NDX. First, collect all those nasty ads together and zip them up into one ZIP file. Use whatever name you like for the ZIP; this example is going to call it BBS-ADS.ZIP. Then run the utility ZDCSBA15.EXE from the directory containing all the ZDCS files. The syntax (7% in NJ) is: ZDCSBA15 BBS-ADS.ZIP (If you have used a different name for your ZIP collection of bbs ads, just use that name in place of BBS-ADS.ZIP.) The program will create the database files and you will be ready to delete all that free advertising. Just make sure that you have used a "Y" as the last line of the configuration file to turn this option on. If you want to create a new bbs ads database in the future, just delete the old database file (ZDCS-BBA.NDX) and follow these same steps to create the new one. If you don't delete the old database, then the new ads will be added to the old ones in the database, which is an easy way to add new bbs ads. Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 7 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- 9. STEP 4 - How do I set up the check for uploaded duplicates? Almost done! Now that you have created the ZDCS database(s), all you need to do is get the bbs to check all uploaded ZIPs for duplicate files from now on. This will be done by processing the uploaded ZIPs with ZDCSFC15.EXE when they are received. ZDCSFC15.EXE is the real-time upload tester that compares the CRCs of the files in the uploaded ZIP against the database(s). It also updates the duplicate files database (but not the bbs ads database) with the CRCs of the new files in the ZIP. To process the new uploads with this program, ZDCSFC15 must be called by the PCBTEST.BAT file (part of PCBoard). A sample PCBTEST.BAT file is included in this package. (I know this configuration works, because I'm using it myself.) Here is that file, with lots of comments. (The file statements are indented more than the comments, which begin with asterisks.) @If Exist PCBFAIL.TXT Del PCBFAIL.TXT @If Exist PCBPASS.TXT Del PCBPASS.TXT @If Exist ZDCS-DEL.LST Del ZDCS-DEL.LST *** The file ZDCS-DEL.LST is created by ZDCSFC15.EXE if you are using either of the two options to delete duplicate files. It is a `control file' that is passed to PKZIP below. EXZTEST %1 0 %2 /@Q:\PCB\EXZIP.EXC /!r:\wrk\comment /s /w *** This command calls EXZTEST and let it do its thing or things. ZDCSFC15 %1 %2 *** These are the two standard parameters passed by PCBoard. If Not Exist ZDCS-DEL.LST Goto End *** If neither of the "delete from zipfile" options set to "Y", then you aren't deleting any files from inside the uploaded ZIP. In that case this file will not exist, so you obviously don't want to pass it to PKZIP. This bypasses the deletion. Pkzip -d %1 @ZDCS-DEL.LST *** This is where any duplicate files within the uploaded ZIP are removed from the uploaded file. :End *** That's all for PCBTEST.BAT, folks! Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 8 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- In addition to creating the required PCBPASS.TXT and PCBFAIL.TXT files, ZDCSFC15 also sets the DOS error level when it exits. These levels are: 0 No duplicate files were found within the upload, or no processing was done. 2 Some duplicates were found. 3 Every file within the upload was a duplicate. 10. LIMITS AND FUTURE ENHANCEMENTS - What else should I know? There is an internal limit of 800 files within an individual ZIP that can be processed. If you find a ZIP file with more than 800 files, ZDCS will process the first 800 files and then move to the next ZIP for processing. If you do encounter a ZIP with more than 800 files (including files in ZIPs inside the ZIP), please let me know. This limit can be raised if necessary. I set this limit in the interests of keeping required RAM to a minimum. As released in version 1.50, ZDCSFC15 requires a little less than 100K, plus whatever overhead PKUNZIP requires for processing ZIPs within ZIPs. Processing of ZIP files contained within ZIP files is accomplished with some caveats. ZIPs within a ZIP are only checked one level deep. The simplest explanation I can think of is an example. ZIP A contains assorted files and ZIP B. In turn, ZIP B contains more files and another ZIP, C. ZIP C contains still more files. How does the whole melange get processed? All the files in ZIP A and in ZIP B have their CRCs entered into the duplicate files database. If you have selected to delete either *all* duplicates or *designated* (bbs ads) duplicates, then those deletions are done automatically only for the files in ZIP A. Of course, all the duplicates in ZIP B are still listed in the log, so you do know about them and you can decide whether to remove them manually. What about ZIP C? That's easy: any ZIP embedded more than one level deep in the uploaded ZIP (and C is two levels deep) is not processed as a ZIP at all. No file deletions, no CRCs, nothing. In fact, the only thing that happens to it is that its CRC is added to the duplicate files database. There is no way to delete ZDCS database records. Frankly, I see no need for this facility. If you do discover a need to do this, register your copy of ZDCS and let me know what you would like. Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 9 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- 11. REVISION HISTORY - What's happened since version 1.00? The database builder ZDCSDB15.EXE now correctly handles ZIP files stored *with paths* within ZIP files. The previous internal limit of 99 pathnames has been increased to 999. Where the existence of an empty subdirectory among the pathnames caused the earlier database builder to abort, ZDCSDB15.EXE can now handle an empty subdirectory. The real-time upload checker ZDCSFC15.EXE now correctly handles ZIP files stored *with paths* within ZIP files. Previously, a file listed in the bbs ads database could still be considered a duplicate file, which affected the "percentage of new files" calculation for an uploaded ZIP. The bbs ads are no longer counted as duplicates when determining the percentage of new files in the upload. The database structures used in ZDCS 1.00 have been redesigned for version 1.5 to cut down on the database size. At the expense of a little speed, the database is now slightly less than half the size it used to be. The new option of "allowed duplicates" has been added in version 1.50. This is so that files like OMBUDSMN.ASP and VALIDATE.DOC needn't be counted as duplicate uploads. The allowed duplicate files may be designated by either filename or by CRC. A new test mode has been added for the operation of the upload file checker, ZDCSFC15.EXE. If the word TEST is used instead of the word UPLOAD as the second parameter passed to ZDCSFC15, then the upload checker can be used to test a specific file for duplicates without causing any updates to be written to the database or the logfile and without creating PCBPASS.TXT or PCBFAIL.TXT. Test results will be written to ZDCS-TST.OUT. The easiest and safest way to run such a test on any ZIP file FOO.ZIP is to issue the command to test the files within the ZIP file FOO.ZIP. I don't expect most sysops to need this capability, but it's available. There have been numerous beta versions of ZDCS between release 1.00 and this release 1.50. Conversion programs for the database information from those beta versions to the new version 1.50 format are available on The Hacker Central BBS. Please see the next section for further information about ZDCS support on The Hacker Central. 12. REGISTRATION AND SUPPORT - How do I get help? I'm afraid that I am going to have to break one of my own rules with this release. I have long been of the opinion that sysops are going broke fast enough without my help, and have consequently released all of my sysop utilities as freeware. With the amount of time and effort that has gone into ZDCS and the level of support I have been offering, the ZDCS package has been released as shareware. Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 10 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- ZDCS is a fully functional shareware package. There are no critical limits, crippled features or "drop dead" dates. If you try ZDCS out on your system and decide that you want to continue using it, please register your copy by sending a check for $25.00 (US). Registration entitles you to a license for use of the Zipfile Duplicate Checking System on one bbs, including all future versions of ZDCS. No additional fees will be charged for registration of future versions of this product. Please make your registration check payable to Michael W. Cocke and mail to: Michael W. Cocke 11 Cedar Road Montville NJ 07045-9582 Please be assured that I will continue to support ZDCS through future revisions. I have been using it on our own bbs, The Hacker Central, and wonder how I lasted so long without it. Product support will be handled two ways. Questions and discussion of ZDCS are welcome in the ILink SHAREWARE conference, available through most ILink member bbs's. Product support will also be done on our home bbs, The Hacker Central. Please call us on our public node 1: 201-334-2555 N81 1200-2400 baud. Callers are usually validated within 24 hours and all registered users of ZDCS will be given access to the high speed node 2: 201-318-8840 N81 1200-38,400 HST/v.32. A special conference is available for registered users of ZDCS. (Suggestions and requests cheerfully welcomed.) If you have a beta copy of ZDCS 1.5 (dated before March 30, 1991), you can simplify conversion to the release version of ZDCS 1.5 with the following utilities, all available on The Hacker Central BBS: ZDCSO2N.EXE Database format converter to change over from version 1.0 to version 1.5. ZDCSL2N.EXE Database format converter to change over from version 1.5L (a beta version) to version 1.5. ZDCSRK15.EXE Index key regenerator to change over from any beta versions except 1.5L to version 1.5. ZDCS may be freely copied and distributed provided that it is distributed in its complete and original form with all individual files intact and unmodified. No fee of any kind may be charged for distribution or copying of the ZDCS package. Zipfile Duplicate Checking System (ZDCS) Ver. 1.5 Pg. 11 Copyright (C) 1991, Michael W. Cocke -------------------------------------------------------------------- 13. ACKNOWLEDGEMENTS - Thank you! I'd like to thank my beta testers, especially Reginald Hirsch of Ye Olde Baily BBS in Houston TX. With his help and patience I was able to track down a few obscure but highly frustrating bugs. Also my thanks to my wife Evelyne Stalzer. She's the one responsible for turning my explanations, comments and hieroglyphics into English documentation. For over the past month I have been eating, breathing and sleeping ZDCS, and she has put up with my frequent (mental if not physical) absences. 14. COPYRIGHTS AND LEGAL STUFF - What do the lawyers want to see? Michael W. Cocke and MWC Enterprises will not accept responsibility for the function, failure to function, or side effects of any function of the Zipfile Duplicate Checking System (ZDCS). ZDCS is provided in good faith, but its use is *solely* at the risk of the operator. EXZTEST is copyright Andy Keeves. Lantastic is trademark of ArtiSoft Inc. MS-DOS is copyright and trademark of Microsoft Corp. Novell is trademark of Novell Inc. PCBoard is copyright and trademark of Clark Development Company. PKZIP and PKUNZIP are copyright and trademark of PKWARE, Inc. ZDCS is copyright (C) 1991, Michael W. Cocke.