Disk Cache Request Sorting The dangers of changing the order of disk writes Introduction At least three leading disk caches on the commercial software market attempt to enhance a PC's performance by changing the order that information is written to a disk. Instead of processing disk writes on a first-in-first-out (FIFO) basis, these caches use request sorting. Request sorting can reduce the elapsed time required to complete a series of disk writes, but it can also create periods of jeopardy that result in the corruption of a disk's file allocation information as well as a loss of data. PC-Kwik Corporation has developed a diagnostic program that monitors a specific directory for these periods of jeopardy. When the directory and the file-allocation information become at risk, the user is alerted to the problem. PC-Kwik's disk cache, Super PC-Kwik, provides request sorting as an option but does not use this technique as a default. Instead, it processes write requests on a FIFO basis, minimizing risk while still providing a dramatic performance gain. Other caches expose the user's disk to the risk of corruption by defaulting to request sorting. Often when the request sorting feature is turned off, the performance gain provided by these other caches decreases dramatically. History Sorting the order of requests is a common technique for enhancing the throughput of a disk subsystem. Time can be saved if the data is written to the desired sectors with less overall movement of the disk drive's heads. Request sorting algorithms were originally designed for mainframes under the assumption that all requested input-output (I/O) would be completed. This conclusion was based on three assumptions: 1. I/O is under the control of a protected operating system (i.e., not subject to failure due to bugs in application code). 2. The hardware operates with an uninterruptable power source. 3. The system is being operated by trained technicians. Request Sorting in the PC Environment While these assumptions are typically valid for mainframe systems, they are not usually valid for DOS-based PCs. Instead, we are all too familiar with frequent system crashes due to application bugs. Most of us have accidentally rebooted or powered down our systems by hitting the reset button or the power switch at the wrong time. And most of us have also experienced system failures due to power fluctuations. When request sorting is not used, the impact of these problems is limited to the potential loss of the new data being written, instead of threatening the integrity of entire disk structures and databases. Impact of Request Sorting on Database Systems The designers of sophisticated PC database systems understand that their database update algorithms will sometimes be subject to system crashes. To minimize the damage that these crashes produce, they carefully order the sequence of disk writes used to update index and data records (or their equivalent). Because of this careful ordering, the database software or utilities provided with it can usually recover from system failures by detecting how far database transactions proceeded before the failure, and limit the damage to the records not yet written. When the order of modifications to the database is changed by a disk cache, the recovery procedure cannot accurately determine which transactions and substeps were completed, compromising the integrity of the entire database. Due to this unrecoverable damage that request sorting can cause here, most database users do not feel the performance increment is worth the risk. Risk vs. Reward A sophisticated user will occasionally choose to accept the risks of request sorting to obtain the advantage of additional performance based on their knowledge that the application environment is largely bug free, an uninterruptable power supply is being provided, and the system will not be turned off or rebooted during the database transactions. A careful user will also take the precaution of making regular backups of systems if they employ request sorting. Unfortunately, though, there are many unsophisticated users using request sorting because most disk caches use request sorting algorithms by default. The user is given little or no information about this risky technology and what the trade-offs are between performance and data integrity. The Impact of Request Sorting on Disk Structures The DOS File Allocation Table (FAT) and directory structures can be thought of as a special type of database. When DOS creates a file, it carefully postpones pointing the new directory entry at the newly allocated disk clusters until the FAT contains the required information about these clusters. If a request sorting disk cache is used, there may be periods of time when directory entries point to disk clusters that are marked as unallocated in the FAT. If the system is rebooted due to an application crash or a power failure during one of these jeopardy periods, the directory entry will continue to point to unallocated space on the disk. If the system continues to be used, the same space will be allocated to another file, and cross-linked files will result. Cross-linked files are almost impossible to correct, because two files end up sharing the same space on the disk. One or both of the files will be corrupted, and there is no reliable technique for recovering the lost data. A hard drive with many cross-linked files is typically referred to as "crashed." Unfortunately, some disk caches reorder the sequence of disk writes so radically that directory/FAT structure are exposed to this type of jeopardy for long periods at a time. Most users have no knowledge that they are incurring this type of risk, because the request sorting is the default mode. Those who are aware of the situation are faced with either accepting the risk or disabling the caching of writes all together. Jeopardy Detection To determine how often jeopardy periods are generated by various disk cache programs, PC-Kwik Corporation developed a Jeopardy Detection program. This program continuously monitors the writing of sectors to a particular disk. It compares the directory entries of a specific directory to the space shown as allocated in the FAT. When it detects a discrepancy it alerts the user by changing the screen's border color to red. The resulting flashing red border gives the user a sense of how often jeopardy periods occur for a particular directory. Disk Caches that Impose Jeopardy Periods Most commercially available disk products default to using some form of request sorting. According to our analysis, some of those which create periods of jeopardy include: -- SMARTDrive which is included with Microsoft Windows 3.1 and DOS 6.0 -- NCache2 which is included with Norton Utilities 7.0 -- PC-Cache which is included with PC-Tools 8.0 We expect that many other disk cache programs generate the same sort of risk. Summary The Super PC-Kwik disk cache defaults to strict FIFO processing of disk writes. It provides a dramatic improvement in throughput without incurring the risks of request sorting. Sophisticated users have the option of reading documentation about advanced options like request sorting, time-delayed writes, and quick return of the DOS prompt. They can then make an informed decision about which of these options are appropriate for their situation. FIGURE 1 File Allocation Table Directory +------------------------+ +---------------------+ | | | | | | +---------------------+ | | | 1STFILE.DAT + | | [Allocated] | +-----------------|---+ | | | | | | +-------------|----------+ +-----------------|---+ | | | +------------------------------+ | | | V | +------------------------------+ +--->| disk space | +------------------------------+ Figure 1. File 1STFILE.DAT occupies space on the disk. Space is allocated in the File Allocation Table and the file name has a directory which points to this space. FIGURE 2 File Allocation Table Directory +------------------------+ +---------------------+ | | | | | | +---------------------+ | | | 1STFILE.DAT + | | [Unallocated] | +-----------------|---+ | | | | | +------------------------+ +-----------------|---+ | +------------------------------+ | V +------------------------------+ | disk space | +------------------------------+ Figure 2. During a jeopardy period, file 1STFILE.DAT has a directory entry, but is not allocated in the FILE Allocation Table. If a crash occurs during this jeopardy period, the disk space wil remain unallocated. FIGURE 3 File Allocation Table Directory +------------------------+ +---------------------+ | | | | | | +---------------------+ | | | 1STFILE.DAT + | | [Allocated] | +-----------------|---+ | | | | | | +-------------|----------+ +-----------------|---+ | +-----------------------------+ | | | | +---------------------+ | | | | | | +---------------------+ | | | 2NDFILE.DAT + | | | +-----------------|---+ | | | | | | | +-----------------|---+ | | + | | +---------------------------+ | | | | V V | +------------------------------+ +---->| disk space | +------------------------------+ Figure 3. Crosslinked files. After system recovery, new file 2NDFILE.DAT is created and unllocated space is assigned to it. 1STFILE.DAT and 2NDFILE.DAT now occupy the same physical location and both have directory entries. PC-Kwik is a registered trademark of PC-Kwik Corporation.