Zipfile Duplicate Checking System (ZDCS) Ver. 1.65 Copyright (C) 1991, Michael W. Cocke ---------------------------------------------------------------------- A Walk-Through Guide for Installing and Using ZDCS on a PCBoard BBS TABLE OF CONTENTS ----------------- HELLO!..........................................................1 OVERVIEW OF ZDCS................................................2 ZDCS OPTIONS....................................................3 Deleting Duplicate Files...................................3 Allowed Dupicates..........................................3 BBS Ads....................................................4 Pre-Testing................................................4 INSTALLATION....................................................4 STEP 1 - Set up the config file.................................5 Line 1................................................5 Line 2................................................5 Line 3................................................5 Line 4................................................5 Line 5................................................6 Line 6................................................6 Line 7................................................6 Line 8................................................6 STEP 2 - Create the initial database............................7 STEP 3 - Create the bbs ads database............................8 STEP 4 - Create the list of allowed duplicates..................9 STEP 5 - Set up the upload file check...........................9 PRE-TESTING....................................................11 THAT'S ALL, FOLKS!.............................................12 HELLO! ------ ZDCS is a shareware set of utilities intended to help a PCBoard sysop deal with the problem of duplicate files, whether those files are already on the bbs or are being uploaded by a caller. It provides specific support for looking inside zipfiles (including PKZIP version 1.93) and self-extracting files made with PKZIP (*.SFX). Since ZIP files are by far more common that SFX files, and since they are both treated the same way by ZDCS, we'll use the term "zipfile" to refer to both of them. ZDCS also provides support for accepting unzipped GIFs. ZDCS thinks of such a GIF file as a poor zipfile with only one file in it. You'll find more detailed information on GIF files and others in the reference manual, ZDCS-REF.TXT. This walk-through will guide you in getting ZDCS up and running on your bbs. The reference manual, ZDCS-REF.TXT, has more detail on all sorts of things (probably more than you will want or need), but it's arranged to be the sort of file where you look things up when you want more information, not something you read all the way through. On the other hand, this walk-through is a friendly guide that will take you by the hand and show you how easy it is to install ZDCS. Don't be put off by the length of this file. ZDCS is quite simple to install. We've added lots of extra words to explain the process and your choices for how you want to run ZDCS. OVERVIEW OF ZDCS ---------------- There are two basic functional parts to ZDCS: checking an existing collection of files for duplicates, and checking newcomer files against the existing collection. It's set up this way to permit use on a dynamic collection like a bbs file system, but it can also be used to look for duplicates on other systems as well, such as a shareware CD-ROM or even multiple directories on your hard drive system. However, the focus is on bbs systems and that's how the documentation is written. To check an existing collection of files, you have to create a ZDCS database for that collection. Then you can use the duplicate report generator to tell you which files are duplicates and where they are found. A little housecleaning based on this report can weed out duplicates and recover hard drive storage space, something no one ever has enough of. When new files are added to the collection, you need to be able to do two things with them: add the files to the database, and find out if there are any duplicates among them. The most common addition of new files is via uploads to the bbs. ZDCS will enable you to process an upload as soon as it is received, and tells both you and your caller about duplicate files in that upload. It can be set to accept or decline an upload based on the percentage of dupes it finds. It also adds any new files to the ZDCS database to keep that up to date. There's a bit more, of course. ZDCS can be told to decline an upload, to automatically remove duplicate files, to delete those pesky little bbs ads from uploads, and to recognize "allowed duplicates" - or any combination of the above. There's even a pre-test capability that lets callers find out ahead of time whether or not their intended upload duplicates files already on your board. Installing ZDCS to work on your bbs is relatively simple. The hardest part may be deciding which options you want to enable. Let's look at those first. ZDCS OPTIONS ------------ Running ZDCS without any of the extra options enabled gives you a basic duplicate checking system that will not delete any files automatically. It will still tell you about all duplicate files, but it won't be able to distinguish between files you always want to delete (like bbs ads), files that duplicate exactly ones already on your board, and files that are duplicates but still important to keep (like VALIDATE.COM from Macafee's SCAN program). You can try out the different options and change your mind about which ones you want to use without re-installing ZDCS. Deleting Duplicate Files ------------------------ When a new zipfile is uploaded to the board, ZDCS checks it against the existing database to see if there is any duplication of files. You can permit ZDCS to not only flag the duplicates but also to delete them from inside the zipfile, leaving the rest of the upload intact. This deletion feature does not operate on a GIF file. This is a good time to point out one consequence of deleting all dupes. Some shareware authors issue updates that consist of both new files (executables, perhaps) and unchanged files (registration information, maybe). If you enable the deletion of all duplicate files in an uploaded zipfile, you can lose some of the files that belong in the author's package. This brings us to the concept of allowed duplicates. Allowed Duplicates ------------------ You can choose to tell ZDCS that certain files are allowed duplicates. When you create a list of allowed duplicates, ZDCS knows not to treat these files as dupes no matter how often they appear. You can add new files to the list or delete old ones with any text editor. Why would you want to have allowed duplicates? There are some files that reappear frequently as part of shareware or freeware packages, such as OMBUDSMN.ASP (found in ASP-ware), or VALIDATE.DOC and VALIDATE.COM (from Macafee's SCAN program). Especially in a case like Macafee's where new versions of the program come out frequently, each with certain standard files included, it would be useful to "recognize" these duplicate files as being acceptable. There are two places where enabling the allowed duplicates option makes a difference. It prevents these files from being deleted if you have selected the option to delete duplicate files. It also does not figure these files into the calculation for what percentage of an uploaded zipfile is duplicates. This feature does work for GIF files. If a GIF is included among the allowed duplicates, then a repeat upload of the same GIF will not be flagged as a duplicate, and the upload will be accepted. BBS Ads ------- There are some specific files that you may want to delete every time you run across them. The prime example of this would be those rude and annoying bbs ads from you-know-who that a few boards feel it necessary to add to every zipfile they carry. You might also have other problem files that you want to keep off your board, like some of the pyramid scheme get-rich-quick files that circulate from time to time. You can tell ZDCS to recognize these nuisances by creating a bbs ads database. (This is always referred to as the "bbs ads database" to distinguish it from the main ZDCS database, which is bigger and not the least bit optional.) When you run the upload file checker, ZDCS will flag any files it finds that match the ones listed in the bbs ads database. If you want to do more that just flag these pests, you can also tell ZDCS to delete all bbs ads automatically. Of course, you can include new bbs ads as they are perpetrated. This option is completely independent of the option to delete duplicate files, so you don't take a chance on removing authors' unchanged files from newer shareware versions. The bbs ads option does not operate on a GIF file. Pre-Testing ----------- You can decide whenever you like to allow or disallow pre-testing. Since it's very easy to enable and won't affect the installation of ZDCS at all, we've covered it in a separate section at the end of this walk-through. After you've gotten ZDCS installed on your bbs, you can take your time about making the decision to pre-test or not to pre- test. INSTALLATION ------------ There are five basic steps to installing ZDCS to work with the bbs: 1. Setting up the configuration file. 2. Creating the initial database. 3. Creating the bbs ads database (optional). 4. Creating the list of allowed duplicates (optional). 5. Setting up the check for uploaded duplicates. None of the steps is at all difficult. We'll cover them one by one. STEP 1 - Set up the config file ------------------------------- The first step in the installation is to create the ZDCS.CFG file for your configuration. This is a flat ASCII text file containing eight short lines. A sample configuration file is included in this package, so you might want to look at that. It takes more time to describe the file than it does to write one from scratch. Put the finished configuration file into the same directory that contains the executable ZDCS files, unless you're running DOS 2.x. If that's the case, either upgrade to at least DOS 3.x or go to the ZDCS reference manual ZDCS-REF.TXT and read the answer there. Line 1 ------ This line is the complete drive, path and filename of an ASCII text file. This is a file that you create listing all the pathnames, one on each line, that contain the zipfiles / GIFs to be included in the database. To process a new collection of files into the ZDCS database, like those on a CD-ROM, just change either this line or the contents of the file it points to. There is no upper limit on the number of pathnames that can be processed. But make sure that you've included the trailing backslash for each pathname. Line 2 ------ This line is the drive and pathname where you want the finished ZDCS database to be located. It makes no difference if you include the trailing backslash here or not. Line 3 ------ This line is either the letter "Y" or the letter "N". It controls whether you want ZDCS to delete all duplicate files from an upload (Y) or just flag them and leave them intact (N). Line 4 ------ This line is an integer - that's a whole number, no decimals - between 0 and 100. It sets the maximum percentage of dupes that your bbs will accept in an upload. ZDCS will calculate the actual percentage of duplicates in the upload and compare it to your maximum percentage. If the actual percentage is lower, the upload is accepted. If the actual percentage is equal to or higher than the maximum you specified, the upload is declined. (The PCBoard code takes care of moving these declined files into your private upload directory, where you can review them.) Setting the percentage to 100 effectively bypasses this filter, since it permits a duplicated GIF or a zipfile with nothing but duplicates to pass. At the other extreme, setting the percentage to 0 effectively requires that the uploaded GIFs and zipfiles have no duplicates at all. Line 5 ------ This line is the complete drive, path and filename you want ZDCS to use for the log file created by the upload file checker ZDCSFC. Here is where you'll find the messages telling you which files are duplicates, bbs ads, allowed duplicates, or absolutely new and fresh. Line 6 ------ This line is either the letter "Y" or the letter "N". It controls the switch to tell ZDCS whether to delete bbs ads (Y) in an uploaded zipfile or to just flag them (N). If you've decided not to enable any checking for bbs ads at all, just set this to N. Line 7 ------ This line is where you can express from 1 to 72 characters' worth of creativity. Whatever text you write here will be displayed to the caller who has just uploaded a file that ZDCS has declined. A boring but informative possibility is (without the quotes) "Too many duplicate files - upload must be reviewed by sysop." If you don't want to display any message to the caller, you can place one or more blank spaces on this line. Just don't leave the line completely blank! The PCBoard @codes are supported here. Line 8 ------ This line consists of the single letter "Y" or "N". It controls whether ZDCS displays the one line "registered to" message after the board receives an upload (Y) or turns off the display of this message (N). Either way, the caller still sees the file by file breakdown of the upload and the status (duplicate, bbs ads, etc.) of each file. This line is only recognized by the registered version of ZDCS. It has no effect on the three line message displayed by the unregistered version. It is also the only line of the configuration file that you can forget to include without causing major problems. If the line is missing, ZDCS defaults to (Y) and displays the message. STEP 2 - Create the initial database ------------------------------------ The next step after creating the configuration file is to create the initial database of CRC32 values. To do this, you simply run the database build program ZDCSDB.EXE. Assuming that you have created the configuration file properly, there is nothing more to be done until this program finishes processing the files. While ZDCSDB is running, the display points out that you may press the F10 key at any time to abort. This is the only safe way to abort the database build! Ignore this warning and the chances are good that you will be rewarded with lost and/or cross-linked clusters on your hard disk. (This has been an official warning.) There will also be other useful and esoteric bits of information on the screen, but you can read more about those in the reference manual ZDCS-REF.TXT. The one that will probably interest you the most is on the very last line. The end time appears here after the database build is complete. Usually you will be using ZDCSDB on a collection of files that have already passed a file integrity checker. If for some reason this is not the case, you can still use ZDCSDB on those files by making use of the T (for Test) switch. This slows the processing down tremendously. It's really a better solution to use a file integrity checker to handle this duty, but ZDCSDB T is a workable alternative if you need it. What happens when you use the "T"? ZDCS sends out to PKZIP to come in and test each zipfile. Any one that is damaged is marked and not processed by the database builder. GIF files are passed over by PKZIP, so they receive no file integrity checking at all. The GIFs are still processed by ZDCS to add them to the database. There is a log file called ZDCS-DBB.LOG created by the database build operation. This is where you would find messages about corrupt zipfiles ZDCS encounters while trying to build the database, for example. If you have any problems while running ZDCSDB, look in this log file for help in understanding what happened. It is entirely likely that when you first create the initial database you will already have some duplicate files in your collection of zipfiles and GIFs. To find out about them, use the database report generator ZDCSDR to generate a flat ASCII text file called ZDCS- DUP.LST. This will create the list of all duplicate files in the database, including the name and CRC32 of the duplicated file and the identity with full drive and pathname of the zipfile or GIF containing the dupe. The format of the ZDCS-DUP.LST file is the standard comma- separated variable to make it easy to import this file into a database or parse it into a .BAT file. When you run ZDCSDR, it asks you whether you want the results sorted by the CRC32, the individual file name, or the name of the zipfile or GIF containing the dupe. This sort is called the Wichita sort for very obscure historical reasons. (The culprits know who they are.) Note that no duplicate files are deleted by ZDCSDR when you create the initial database. The list of duplicates, ZDCS-DUP.LST, can be used by the sysop to remove any duplicate files in the system. After the sysop has cleaned up the file system and removed any duplicate files, it is possible to purge duplicate entries from the CRC32 database in order to reduce its size. This is covered in the reference manual ZDCS-REF.TXT. STEP 3 - Create the bbs ads database ------------------------------------ This step is optional. You need it only if you want to flag and/or delete bbs ads from uploaded zipfiles. If you want to complete the basic installation and do this step at a later date, that's fine too. ZDCS doesn't quite have the intelligence to recognize a bbs ad unless you tell it which files to look for. To do this, you have to create the bbs ads database. This consists of a single file, ZDCS-BBA,NDX, which will be located in the same directory as the rest of the ZDCS files. The easiest way to do this is to first collect all those nasty ads together and zip them up into one zipfile. Use whatever name you like for the ZIP; this example is going to call it BBS-ADS.ZIP. Then run the program ZDCSBA.EXE from the directory containing all the ZDCS files. The syntax (7% in NJ) is: ZDCSBA BBS-ADS.ZIP (If you have used a different name for your ZIP collection of bbs ads, just use that name in place of BBS-ADS.ZIP.) The program will create the database files and you will be ready to flag or delete all that free advertising. Whether or not you delete the bbs ads is controlled by line 6 of the configuration file. You did read the section on setting up the configuration file, right? If you want to create a new bbs ads database in the future, just delete the old database file (ZDCS-BBA.NDX) and follow these same steps to create the new one. If you don't delete the old database, then the new ads will be added to the old ones in the database, which is an easy way to add new bbs ads. To make it even easier to include single ads as they come on the market, you can also run this program on an individual bbs ad, whether it is zipped or unzipped. So, if a new bbs ad named HAPPY.BBS shows up, you can issue the command: ZDCSBA HAPPY.BBS This will add HAPPY.BBS directly to the bbs ads database. You can even include a bbs ad in this database without having the original file on hand. All you need is the CRC32 value for the bbs ad. In the command above, you would replace HAPPY.BBS with a dollar sign $ followed immediately by the eight character CRC32 value. STEP 4 - Create the list of allowed duplicates ---------------------------------------------- This step is also optional. You need to do this only if you want ZDCS to recognize certain files as allowed duplicates. Whether or not this option is enabled is controlled by the presence or absence of an ASCII text file called ZDCS.ADN in the ZDCS directory on your system. This file is one that you create with any text editor to list all the files, one per line, that are allowed duplicates. You can designate an allowed duplicate by either its file name or its CRC32. To specify an allowed duplicate by name, just type the dollar sign $ followed immediately by the name of the file with its extension. This preserves a file with a distinctive name (like OMBUDSMN.ASP) even if it undergoes some revisions. To specify an allowed file by its CRC32 value, type the pound sign # followed by the CRC32 for the file. If you don't know the CRC32 offhand, you can get this information from PKZIP. Issuing the PKZIP - V command on any zipfile gives you the CRC32 values for each individual file inside the zipfile. The ZDCS.ADN file may be up to 256 lines long, but must contain no blank lines and no blank spaces. STEP 5 - Set up the upload file check ------------------------------------- Almost done! Now that you have created the ZDCS duplicate file database, all you need to do is get the bbs to check all uploaded zipfiles and GIFs from now on. This will be done by processing the uploaded zipfiles and GIFs with the upload file checker ZDCSFC as the files are received. ZDCSFC.EXE is the real-time upload file checker. If the uploaded file was a GIF, it first calculates the CRC32 for it. Then ZDCSFC compares the CRC32 of the uploaded zipfile or GIF against the ZDCS database and the bbs ads database (if you've created the bbs ads database, that is). The results of this comparison are displayed file by file for the caller and logged into the ZDCSFC log file. (Remember back on line 5 of the configuration file, when you specified the pathname and filename for this log?) ZDCSFC calculates the actual percentage of duplicate files in the upload. Since a GIF is a single file, it will either be 0% (not a dupe) or 100% (a dupe). For zipfiles, this actual percentage can vary anywhere between 0 and 100. ZDCSFC compares this actual percentage against the maximum percentage you set in line 4 of the configuration file. (Boy, that configuration file is a handly little thing.) If the actual percentage is lower than the maximum, the upload is accepted. If the actual percentage is equal to or higher than the maximum you specified, the upload is declined. The file is never lost or dumped! PCBoard moves these declined files into your private upload directory, where you can review them. If you want to bypass this filter, set the percentage to 100. This permits a duplicated GIF or a zipfile with nothing but duplicates to pass the filter and never be declined. At the other extreme, you can set the percentage to 0. This effectively requires that the uploaded GIFs and zipfiles have no duplicates at all, or they will be declined. ZDCSFC also updates the duplicate files database with the CRC32s of all uploads. You do not have to do anything special to include this information in the database for comparisons with future uploads. ZDCSFC does not modify the bbs ads database at all. It's still not smart enough to recognize a bbs ad until you've pointed it out first - but it does remember them next time it sees them. To process the new uploads with ZDCSFC, it must be called by the PCBTEST.BAT file of PCBoard or by your file integrity checker. We are actively pursuing discussions with makers of file integrity checkers to provide a seamless integration with ZDCS. EXZTEST already has such a feature in version 2.x, which calls ZDCSFC directly and feeds it the necessary information to process uploads. To understand how to call ZDCSFC with the older (1.x) and newer (2.x) versions of EXZTEST, take a look at the sample files PCBTEST.BAT and PCBTEST.ALT, respectively. They are included in this package. For more information on the integration available in EXZTEST 2.x, please see the documentation for EXZTEST. To use ZDCS with the file integrity checker of your choice (even if it's just PKZIP -T), there are a couple of things you need to include in the PCBTEST.BAT file. First, it's highly recommended that you include the following three lines to clean out old copies of these files left over from processing other uploads: @IF EXIST PCBFAIL.TXT DEL PCBFAIL.TXT @IF EXIST PCBPASS.TXT DEL PCBPASS.TXT @IF EXIST ZDCS-DEL.LST DEL ZDCS-DEL.LST Second, call your integrity file checker to process the upload. (Your syntax may vary.) Third, call ZDCSFC to check the upload. The appropriate command is: ZDCSFC %1 %2 %3 If you are running a version of PCBoard older than the May 22, 1991 version 14.5A, PCBoard will provide only two parameters. Since the third one is needed for pre-testing, that means you must disable the pre-testing feature. Even if you don't want to use the pre-testing right now, it is still strongly recommended that you leave the third parameter in place. It does no harm and could save you some grief if you change your mind in the future. Other than that, ZDCS should be OK on those older versions. At the end of processing by ZDCSFC, you either have a new file called ZDCS-DEL.LST or you don't. This file contains the list of individual files within an uploaded zipfile that are targeted for deletion. If you haven't enabled any deletion of either duplicate files or bbs ads, this file won't be created. Even if you have enabled one or both of these deletions, the file still won't be created unless the upload that was just processed had something to be deleted. Fourth, skip to the end if there are no files to be deleted: IF NOT EXIST ZDCS-DEL.LST GOTO END Fifth, perform the deletion of files specified by ZDCSFC: PKZIP -D %1 @ZDCS-DEL.LST Note that the actual deletion is done by PKZIP. This is also the only time an existing AV stamp on an upload is affected by ZDCS. Sixth, don't forget to include the end: :END This concludes our tour of PCBTEST.BAT. In addition to creating the required PCBPASS.TXT and PCBFAIL.TXT files, ZDCSFC also sets the DOS error level when it exits. These levels are explained in the reference manual ZDCS-REF.TXT. PRE-TESTING ----------- Callers have been enthusiastic about duplicate checking on a bbs file system - until they discover that the huge file they've just finished sending your bbs at 2400 baud is a set of complete duplicates. Many of them asked for a way to pre-test an upload before actually transmitting the entire file. ZDCS gives them this capability without requiring any ongoing handholding, explanations or help from the sysop. Pre-testing is most valuable when it's simple enough that callers actually use it. ZDCS pre-testing requires nothing that might tax a relatively novice uploader's skills. There are no special files to download, no complicated operations to get right, no arcane rituals to perform. All that your callers need is PKZIP and enough skill to upload a file. There are four simple things for you to do so that ZDCS pre-testing runs smoothly on your bbs. 1. Make sure that your board code is no earlier than PCBoard version 14.5A from May 22, 1991. This and all subsequent versions of PCBoard support ZDCS pre-testing. 2. Confirm that the PCBTEST.BAT file contains the following line: ZDCSFC %1 %2 %3 You already set up the PCBTEST.BAT file as part of the ZDCS installation. Just check it one more time, OK? If you've inadvertently left out the third parameter, the pre-test will work just fine for the first caller and will barf all over subsequent callers, who will come complaining to you that their ZDCSTEST.CHK is being called a duplicate! 3. Make sure that *.CHK files may be uploaded to your bbs. This is done via the UPSEC file, part of PCBoard. If you don't permit callers to upload files with the .CHK extension, you've effectively prevented them from using the pre-test feature. 4. When you're ready, post the sample bulletin included in this release of ZDCS to explain to callers what the pre-test feature is all about. The bulletin holds your callers' hands through the whole process, which should take some of the load off you. If your bbs permits the uploading of SFX or GIF files, you might want to add those initials where you see ZIP in the bulletin. A ZIP- only bbs can use the canned bulletin right from the package. If you track your callers' uploads and downloads, you might like to know that the ZDCSTEST.CHK will not count as an upload. The bulletin mentions this little fact just in case some of your callers are a bit *too* enthusiastic about collecting upload credits. THAT'S ALL, FOLKS! ------------------ That's all there is to it, and there ain't no more. ZDCS is fully installed on your system. If you want to fiddle with the options, go right ahead. Most of them can be turned on or off with no more than a line change in the configuration file. If you have more questions or are just curious about parts of ZDCS, take a look at the reference manual ZDCS-REF.TXT. It has more excruciating and technical details than this walk-through does. You will also find information about registration (ZDCS is reasonably priced shareware - $25.00), support, access to free utilities and accessories and a host of other goodies in the reference manual - even how to reach us with comments and suggestions.