VIRUSCHK copyright 1988 C. Deneke I am a microbiologist and I hate viruses! So I wrote this program to detect their activities. If we do not stop computer viruses they will destroy the mutual help networks that have been a fundamental part of micro- and personal- computing. VIRUSCHK.EXE is shareware. Please treat this as shareware and support it if you find it useful. The license and warranty are at the end. Using VIRUSCHK.EXE. VIRUSCHK.EXE requires MS-DOS 2 or greater and a PC compatible. I wrote VIRUSCHK.EXE to be easy to run so that you would run it more often and detect viral activities sooner. VIRUSCHK is designed to be run from a batch file, so that it will check all the relevant directories without requiring your intervention. This program works by reading each of your files in a particular directory and calculating a series of values for each file. These values are stored in a data file (VIRUSCHK.DAT) and are used to determine if there have been any changes in your files. A batch file is a file that consists of simple text commands and whose name ends with the extension or type .BAT. Batch files essentially replace the commands that you enter ("key in") from the keyboard. They make it easy to re-enter a complex sequence of commands. For example, the AUTOEXEC.BAT is a batch file that runs every time you start (or "boot") your computer from the disk that contains the AUTOEXEC.BAT file. VIRUSCHK will accept 5 command line switches. These are indicated by a "-" (dash) and tell the program to change some of its internal defaults. The switches that affect VIRUSCHK are: -ipathname This controls where VIRUSCHK will find its input, that is the files VIRUSCHK is to characterize. -opathname This controls where VIRUSCHK is to find and to write the data files (VIRUSCHK.DAT and VIRUSCHK.OLD). These files contain the information calculated about the input files which describes their contents. Both of the pathnames can include drive and directory designations. They cannot include particular file specifications. All of the files in that subdirectory will be checked. If you are checking a floppy disk with out subdirectories only the only the drive (eg "A:" or "B:") is needed. The output files need to be in separate sub-directories because all of the output files are called VIRUSCHK.DAT and the successive files would overwrite each other if they were in the same directory. -d This will display the values for the internal characterizations of VIRUSCHK.EXE. If you register your copy I will send you these values so that you can insure the validity of your copy. -b## Where ## is a number between 5 and 12. This will change the number of active sub-groups used to characterize the files. -m### Where ### is a number between 200 and 900. This will change the largest number used in a single sub-group. Changing either or both of these two numbers will both change all the calculated file characterizations. This variability will make a virus work harder to escape detection even if the virus knows about this program. Note also that the -b and -m switches change the way the data on each file is calculated and consequently the values that are stored. So if you change these switch values you should delete the files VIRUSCHK.DAT and VIRUSCHK.OLD as the file contents (but not the time and date stamps) would appear completely different to this program. I think the best and easiest way to use this program is to have a floppy disk with the same sub-directory structure as the hard disk (plus a /root sub-directory) and keep VIRUSCHK.EXE in the true root directory along with the batch file CHECK.BAT. For example, if I was checking a hard disk called C: that had a root directory and 3 sub-directories labeled "DOS", "PROGRAMS", and "DATA", I would create 4 sub-directories on a floppy disk which were named "ROOT", "DOS", "PROGRAMS", and "DATA". Assuming the floppy disk was A:, the CHECK.BAT file would contain the following lines: VIRUSCHK -ic: -oa:\root -b11 -m817 VIRUSCHK -ic:\dos -oa:\dos -b11 -m817 VIRUSCHK -ic:\programs -oa:\programs -b11 -m817 VIRUSCHK -ic:\data -oa:\data -b11 -m817 Then when you type CHECK at the A:> prompt, each successive line will give VIRUSCHK.EXE the information on where to find the files to work on, how to process the information, and where to store the data. If the input and output (ie -i and -o switches) are not given the current directory will be used. Please note also that only the pathnames and not the files to check are specified. VIRUSCHK.EXE always checks all the files in the specified directory. If you are checking a series of floppy disks in a system with two floppy drives, I would recommend that you use a floppy with VIRUSCHK.EXE in the root directory and a set of subdirectories with names that match the volume labels on the floppies that you want to check. Then include in each subdirectory the file CHECK.BAT that has the single line: A:\VIRUSCHK -ib: -oa:\pathname -b11 -m893 Where a:\pathname points to the current sub-directory. This will save your having to have the VIRUSCHK.EXE program in each sub-directory but will keep the data files in order. Then if you put the floppy to check in B:, the disk with the VIRUSCHK files and sub-directories in A: then enter the correct sub- directory and type CHECK. Also if the -b and -m switches are changed between runs the data calculated will be different and the program will tell you that the files have changed. If you are going to change these switches you might just as well erase the VIRUSCHK.DAT and VIRUSCHK.OLD files to save your self time. Because I wanted to make VIRUSCHK.EXE as easy to use as possible, there is no need for direct user input until VIRUSCHK.EXE needs to know how you want something treated. For example, the first set of title screens will pause briefly, then go on. So you do not have to be present (until the comparisons) for the program to run. The first time VIRUSCHK is run, once the program has read all your files and performed its calculations, it will tell you that it has not been able to find its data file VIRUSCHK.DAT. The program will then ask if you want to save the data just calculated in a new file called VIRUSCHK.DAT. If you enter 'Y' (or 'y'), the data will be stored for use next time. Because no comparisons can be done, the program will just show you the file names as the data about them is being written. If the VIRUSCHK.DAT file is found, each program name will be displayed as this data file is read. Then the individual files will be read and the calculations performed. Again each file name will be displayed so that you can tell how the program is progressing. Finally, the calculated and stored data will be compared. If the file either exists only in VIRUSCHK.DAT or is not included there, you will be prompted to either include or exclude the data. If the file contents or the time and date stamps have changed, you will be told of the changes and asked if you want to update the VIRUSCHK.DAT file. You should be careful of files whose contents have changed but not their date/time stamp, as this could be the signature of viral activity. As far as I know there is no way to write to a file through the MS-DOS file structure without updating the time and date codes. If you changed the files yourself or changed the -b or -m switches the file contents will be reported to have changed but there may not be a viral infection of your files. In all cases, you will be told what the results are and asked what you want to do. VIRUSCHK.EXE will not alter any of your files, it only changes or erases two files: VIRUSCHK.DAT and VIRUSCHK.OLD. So if you want to erase or delete a virus contaminated file you will have to do it yourself. When the data file VIRUSCHK.DAT is read, it is renamed to VIRUSCHK.OLD rather than being deleted. If you need to recover the old data set, first either delete VIRUSCHK.DAT or rename it to VIRUSCHK.1 for example, then rename VIRUSCHK.OLD to VIRUSCHK.DAT and the program will run normally using the old data. History. My interaction with computer viruses began at the end of the last semester. At that time I thought that computer viruses were mythical beasts which, if they existed at all, were more likely the result of program bugs than of malicious intent. However, a number of our students had a semester's worth of work stored on the hard disks of the departmental computers. As students will, when times got tense somebody brought in a game and loaded it on several of the hard disks. It was apparently quite popular and just before the end of the semester it erased the hard disks with the students' semester's work. As with all hard disks insufficient backups had been made. The students probably felt that floppy disks were too expensive, too fragile and too inconvenient. The unfortunate conclusion to the story is that they had to take incompletes for the semester. There was clearly enough guilt to go around, the students for bringing in a game and playing it on computers which were intended for serious projects and ours for not supervising them more closely and catching the virus before it caused more harm. How to detect viruses. So I began to consider ways to defeat viruses. I did not think that a perfect write protecting program was possible which would prevent virus infection and replication but still allow normal disk usage. I was also afraid that a simple write protect could have been defeated by the game carrying a virus. A simple message like "Congratulations you have the current high score! Save to disk?" would allow the operator to override and thus defeat any write protect method I could devise. After all, everybody wants their best high score displayed. Ways for viruses to hide. One way to detect a virus (or a Trojan Horse) is to examine a program's code and look for instructions that could damage your computer system or infect other programs. To do this I have to explain somewhat how DOS works. If you want to skip the detailed explanation in the next section, just consider that a computer program consists of a series of numbers that represent computer instructions. As computers can change numbers easily, programs can change themselves. The particular damaging instructions may not even exist until the virus is about to do some damage. If you are not comfortable with programming skip down to the section on "Revealing viral activity". Programming details of hiding a virus. A program tells MS-DOS what to do by a series of software interrupts. Typically, values are loaded into the AX register and interrupt 21 is called. In assembly language this would be abbreviated as INT 21. The potentially damaging interrupts would include: absolute disk reads (INT 25), absolute disk writes (INT 26), and terminate and stay resident (TSR, INT 27). The absolute disk reads and writes refer to an actual sector of the disk, for example, side 1, track 24, sector 5, rather than reading through DOS. INT 25 is dangerous because a virus can hide parts of itself in the tracks of a hard disk marked as unusable then read them in with an INT 25 instruction. The write to absolute sector can overwrite the boot tracks, the file allocation tables, the hard disk partition information, or any program. INT 27 is dangerous because normally when a DOS program finishes that program's memory storage area is made available and can be overwritten by the next program you use. A TSR program, however, can wait in the background perhaps filtering keystrokes or waiting for another copy of COMMAND.COM to infect or intercepting calls to virus detecting programs and responding by printing "no virus found, everything is OK" on the screen as it erased your hard disk. Again because a program can be self modifying, even scanning the code for INT 25, INT 26 and INT 27 would not insure that they could not exist under some conditions. In pseudo- Assembly code this would be: code increase label by 5 code perhaps lots of code label: INT 21 ; the normal DOS services interrupt code decrease label by 5 more code The normal DOS services interrupt (INT 21) is present in the code until it is run. When running, however, some other part of the program (which can be located anywhere in the code) increases the 21 to 26, the virus does absolute disk writes, then later in the program the INT 26 is decreased to INT 21. Note that the code containing the original INT 21 does not even have to be in the virus code. Viruses can also read and write directly to the hardware. If you used DEBUG to enter values to low level format your hard disk, you were writing directly to the hardware. Direct hardware writes can also be hidden by self- modifying code segments and even further confused by the segmented nature of the 8088 series processors. I could not see any way to be sure that the code was not self modifying. The proper approach would have been to act like a debuger to run your programs and check for a series of problem conditions. The debuger would have had to follow each code loop and I could see that a complex code section could be designed to have a (nearly) infinite number of internal loops which could have changing values, perhaps dependent on the timer or even a random process. This could all be present in a legitimate game or copy protected software. Even a complete run through could not guarantee that under some conditions code containing INT 26 would not exist and this would probably be the easiest type of virus activity to detect. Revealing viral activity. Finally, I decided that I did not care if any given program was a virus or not what I wanted to know was if it changed my files or other programs. If I knew that changes were going on that were not under my direction then I could deal with them from that point. The principal danger of viruses seems to me to be when you do not know what is going on. The initial problems could be blamed on fluky hardware, buggy software, or operator error rather than a virus. And the longer that you wait the worse viral infections get. If you decided to erase everything and the earlier backups were also infected the re-introduction of the virus would just result in chaos. So the main goal of this program is to detect changes in files so that you can determine if you do have a virus. A simple way to detect changes is to look at the time and date stamps. Unfortunately, viruses are usually not considerate enough to do this. MS-DOS (PC-DOS, DOS, IBM-DOS) is a fairly basic operating system. It essentially loads a program and gets out of the way. The important point here is that a program can not load itself, it has to be called from DOS or from the command line. So how will the virus be called or loaded? The simplest way is to use a vector program like the game in the example. However, there has to be an infection of other programs for the virus to be dangerous. If you have a program that immediately reformats your hard disk this is called either a Trojan Horse or a programming bug. In either case you know immediately that a disaster has occurred and can try to overcome it. I feel that the dangerous nature of viruses is that they can work their damage before you are aware of their activities. In contrast, I understand that there was a rather quick reaction when some of the Microsoft C compilers reacted rather unfavorably to the Western Digital hard disk controllers. In this case there was a major loss of files and Microsoft quickly corrected the problem. So the question has become how do I tell if my files are changing. The simple way is to keep a duplicate set of files and do a direct byte by byte comparison. However, this requires a lot of storage space and might allow virus infection of the duplicate files as well. A more elaborate way would be to do a computation such as summing the values of all the bytes in all the files. More elaborate ways are used in telecommunications to detect transmission errors. This would decrease the storage requirements but this method is vulnerable to the virus deleting some of the messages to equal the amount of the virus code added. That is, if the virus added bytes of code summing to 491, then it would just delete bytes of messages until it had deleted bytes totaling the same value (491). Program design. I finally settled on a variant of this method. The value of each byte is read and then a set of computations are done. First all the bytes are summed, then bytes are divided into a set of sub-groups and the sub-groups are individually summed. To make the encryption more secure, you can change the number of sub-groups (bins) used and the modulo used in the encryption with the -b and -m switches. This is just to make it harder on the virus, so that even if the VIRUSCHK.DAT file is available to be read, the virus would first have to determine which values to change in order for the virus to pass undetected. Distribution and source code. I now have a problem with distribution. Shareware is fine, but how could I avoid the FLUSHOT problem. I understand the problem is that FLUSHOT1 through FLUSHOT3 are good but FLUSHOT4 is a buggy program, a Trojan Horse, or a virus. I have decided to send the disk of VIRUSCHK and a manual) at lowest cost ($10.00) which I hope will cover mailing but this still does not give you any way to be sure that the program is good and does what it is designed to do and even is not a virus itself. If you know me that's fine but if I am only (at best) a Post Office box to you there is no guarantee. Indeed, there is no guarantee in shareware that the post office box was not changed somewhere along the distribution process. If you can come up with a better method I would very much like to hear it but the only solution I could think of was to be willing to sell the source code itself. This feels a little like exposing a very private thing but I just can not guarantee that you will be able to understand what VIRUSCHK.EXE does and customize it to your needs any other way. If you have any questions please get the source code and read it through. I tried to write this program so it would be fairly simple to understand rather than cryptic but efficient C. If it meets your needs, then buy the Borland Turbo C compiler (I used version 1.5) and for a total investment of less then $150 (which includes the cost of the Borland Turbo C 1.5 compiler) you can test for viruses changing your files. The C code was kept simple so that minimal changes should be necessary when porting to other C compilers. Also, if you need VIRUSCHK.EXE to work in a non-MS-DOS environment, you will only need to modify the machine specific code sections. If you need other modifications or have suggestions please write to the address below. Other Problems. This program does not address two virus related problems. The first is the presence of virus code in the battery backed configuration memory of AT's and the second is the presence of virus code in the tracks of the hard disk marked as bad. I am working on both these problems. The easiest way to deal with the configuration problem is to remove the battery if you suspect a virus then reconfigure. A modification of this detection problem should work as well. The hard disk problem might best be solved by writing over all the sectors marked as bad. Because this requires absolute disk writes and is very dangerous I am not sure how to distribute such a program if I finish it. If you do have any virus you should probably do the lowest level format available as a precaution. You will have to fill out the following form before I send you the source code because I do not want anyone else selling the code or developing a commercial product based on my code. If you find any bugs in this program please report them in enough detail for me to reproduce them and track them down. Thank you, C. Deneke WARRANTY There is no warranty of any kind either expressed or implied, including without limitation, any warranties of merchantabiltiy and/or fitness for a particular purpose. All persons associated with this program and its distribution shall not be liable for any damage, whether direct, indirect, special, or consequential arising from a failure of this program to operate in the manner desired by the user. There shall be no liability for any damage to data or property which may be caused directly or indirectly by use of this program. In no event, will the owners and/or distributors of this program be liable to you for any damages, including lost profits, lost savings, or other incidental or consequential damages arising out of your use or inability to use the program, or for any claim by any other party. Registration form - Required for source code purchase only (but I still need your name and address if I send you anything) send to: C. Deneke, P.O. Box 6, Old Mystic, CT, 06372-0006 check one disk and registration for $10.00 [ ] source code, disk and registration for $75.00 [ ] Name __________________________________________________________________ Company (if any) ______________________________________________________ Address _______________________________________________________________ Address _______________________________________________________________ Address _______________________________________________________________ City _______________________________________________________________ State _____________________ Zip code ___________________________ I here by agree that I am responsible for the source code sold to me. I understand that this code is protected by copyright and is the property of C. Deneke and that I have to take reasonable care to insure that the source code is treated with due care as valuable property. I will not sell it to any other party nor provide it in any other way. If I break this agreement in any way I will provide suitable compensation covering both all legal fees and compensatory and punitive damages as determined in a court of law. Please sign here for source code ______________________________________ If you just want either to register your copy or get a new copy the above does not apply to you but I do need your address. The copyright on this program still applies and this program can not be sold for commercial gain. License This program is shareware and the copyright is maintained whether the program was obtained as source code, as an executable program, or in any other way. It is not free software and it is not public domain software. You are free to test this program on a trial basis. Registered users can treat the program like a book, that is only one copy can be running at one time. No user may modify this program in any way that would violate its spirit. WARRANTY There is no warranty of any kind either expressed or implied, including without limitation, any warranties of merchantabiltiy and/or fitness for a particular purpose. All persons associated with this program and its distribution shall not be liable for any damage, whether direct, indirect, special, or consequential arising from a failure of this program to operate in the manner desired by the user. There shall be no liability for any damage to data or property which may be caused directly or indirectly by use of this program. In no event, will the owners and/or distributors of this program be liable to you for any damages, including lost profits, lost savings, or other incidental or consequential damages arising out of your use or inability to use the program, or for any claim by any other party.