EQANDA COPYRIGHT 1995-1996 horio shoichi EQANDA NAME eqanda.txt - Expected Questions And Answers CONTENTS This section contains the questions probably raised in using concache.exe and the family programs, the DOS disk cache program, and their answers. Following is the contents of this section. Why And How Cache Programs Speed Up Disk Io ? What Are The Elements To Limit Concurrency ? How Much Memory Should Be Prepared For Cache ? How Concache.exe Can Be Tuned, In Terms Of Conven- tional Memory ? Is There Anything To Note With Relation To Serial Com- munications Software ? Troubleshooting QUESTION Why And How Cache Programs Speed Up Disk Io ? ANSWER Actually, disk cache programs don't speed up disk io. Instead, they reduce the number of disk io operations. They work to the user program as if disk io is completed as soon as possible. They buffer disk data in a large memory area called disk cache buffer (hereafter simply termed cache). For read requests, if the data to be read reside in the cache, data is supplied from cache. Also, data to be read next by user programs are read and stored in the cache. This method of speed up is called "read ahead" or "preread". For write requests, the data to be written is copied into the cache and user programs "think" the data to be written are really written to disks. The data are actually written at the cache program's convenience. This method of speeding up the write requests is called "delay write", "write behind", "write after", or "postwrite". First generation of PC cache programs were generally reluc- tant to use postwrite. This is thought of as a too special luxury. Data to be written are written to disks as soon as requested. The method to handle writes this way is called "write-through". When cache programs arrived on the market which use post- write, it is found the programs more than double the speed of writes. This is because disk allocation table, known as FAT, is located at the top of disk and data space at the Concache 1.10 Last Update: 19 June 1996 1 EQANDA COPYRIGHT 1995-1996 horio shoichi EQANDA opposite corner, every write request first writes FAT mark- ing as used and then turns head to the allocated area and write data sectors. Postwrite in effect eliminates repeated writes on FAT by submitting to DOS yet unwritten FAT. So, not only actual number of write operations are reduced but most head movements are eliminated by not needing to actu- ally go back and forth to FAT area. When working on floppy, you might have experienced severe performance degradation if buffers= statement in config.sys file is inadequately written. Also you might have observed writes get slow down as your program proceed. What cache programs do, up to this generation, is to extend the con- fig.sys statement buffers= to a large cache buffer. Next come so called "advanced" cache programs which attempt to write data back concurrently with user programs. These cache programs don't wait keyboard idle time, for example, to write back cached data. This means traditional DOS pro- grams' common inception that because disk writes are slow they must be held into application program's buffers until absolute needs arise to write them back is wrong. Writing data as required is in fact faster and, perhaps less impor- tant, eliminates the need of huge buffers from each applica- tion program. In addition, because data are written as they are produced, there are less chances of accidental data loss. They become faster, safer and leaner. It might be possible to think disk speed up has taken place beginning with this generation. Concache.exe belongs to this generation, and has added another generality. It allows concurrency as far as there is no reason to refrain from. The result is one floppy, one BIOS disk, and as many as SCSI disks configurable into DOS can be driven concurrently with DOS/user programs. QUESTION What Are The Elements To Limit Concurrency ? ANSWER From hardware point of view, floppies can not perform io concurrently each other due to floppy controller design. Also, IDE disks cannot. SCSI disks can perform io in paral- lel, as seen on many multiprogramming operating systems. At this level, one floppy, one IDE disk and SCSI disks can operate concurrently. Concache 1.10 Last Update: 19 June 1996 2 EQANDA COPYRIGHT 1995-1996 horio shoichi EQANDA The next level to consider is BIOS to support io operations. As far as published BIOS listing is concerned, there is no reason floppy and IDE disk cannot operate concurrently. SCSI drivers are usually written to do io asynchronously. Here comes BIOS capability to distinguish disk events. Stan- dard BIOS handle only two "type"s of disks, which is suffi- cient for floppies and IDE disk environments, as found in most PC configurations. Fortunately, ASPI (advanced SCSI programming interface) specification, now broadly employed, supports a mechanism effectively similar to BIOS disk event notification, called command posting. (See appropriate man- ual about this.) This allows handle individual disk's events. At this level no situations about concurrency issue is changed. The next level of the factor is device driver's non- reentrancy. Even if a device driver manages several disks, it expects its requests come serially but not while the pre- vious requests are in progress. In fact, most known device drivers lose reentrancy necessary for concurrency at the very first two steps of driver code execution. Also, io.sys handles int13, which is passed through by almost any disk device call, in non-reentrant way. So, you may think if third party device driver is used, for example using io.sys for floppies and that for the other disk devices, then at least the combination of one floppy and one hard disk should work concurrently. But no. If both share int13, then they don't work concurrently. Next comes the DOS drive letter availability. If, for exam- ple, a SCSI disk is split into two partitions, with many good reasons, the user loses one drive letter for one disk. These two partitions cannot share the io operation time. Those constitute inherent limitations of concurrency. In practice, there are resource limitations for programs under DOS. For example ASPI drivers may limit the number of pack- ets that it can accept at once. Likewise, ccdisk.exe can limit the concurrency of SCSI disks from its command line. Finally, concache.exe can limit concurrency in two ways. 1) concurrency= option limits the number of concurrent devices. Concache 1.10 Last Update: 19 June 1996 3 EQANDA COPYRIGHT 1995-1996 horio shoichi EQANDA 2) io_buffers= option specifies insufficient io buffers to let devices work concurrently. QUESTION How Much Memory Should Be Prepared For Cache ? ANSWER There are certainly optimal points of cache size. Unfortu- nately, the points are too dependent on application and the mix. There is no clear way to estimate the size and perfor- mance of cache. Fortunately, concache.exe allows change cache size on the fly. You can observe the performance of various cache sizes. If adding memory doesn't improve, then probably your mix needs more memory, or you decide decrease cache memory size without degrading performance. A "pathetic" looking example is presented below. This kind of anomaly is not uncommon in practice. Consider following hypothetical example. I edit, compile, link, and debug programs, just cyclically repeating these steps. For simplicity, assume each step requires exactly one megabyte. And assume each step needs a set of files completely unrelated to the other steps (unrealistic ? but think simple this way for now.) Now let's have 3 megabytes cache. Then how this 3 mb will be used ? Each of first three steps loads editor and source files into first megabyte, loads compiler, header, source, and object files into the next megabyte, finally loads loader, library, object and exe files into the last megabyte. The fourth step finds no free megabyte. So it must select one from among three. Now familiar algorithm is in its turn. Since the content of first megabyte is least recently used, it is considered unlikely to be used very soon. So the algorithm loads exe file, debugger, test data into, you see, into the first megabyte. I go back to editor. It is not in the first megabyte as you have just witnessed. The editor etc. must be loaded into second megabyte under similar fuss. This will purge com- piler and so on from second megabyte. ... In this example cache performance is no better than if I used only one megabyte cache. If I added another megabyte then the performance will be jump improved but adding the Concache 1.10 Last Update: 19 June 1996 4 EQANDA COPYRIGHT 1995-1996 horio shoichi EQANDA more does no good. If your job mix consists of five mutu- ally unrelated steps each requiring one megabyte and cache size is four megabytes, then four megabyte space is no bet- ter than one megabyte. This extremity comes out of commonly used LRU algorithm and extremely simplistic assumptions of usage pattern. The least recently used space is unlikely to be used very soon, but actually it is in this case. So, to pick up a victim out of already used three megabytes, let us select it randomly. The probability of the survival of the next needed megabyte is 0.67, and cache performance is improved that much, isn't it ? A similar situation is when copying a large file. Never read again and never written again records continually flows into cache data area, thereby erasing useful data from there. So, more than double the file size cache area is necessary to keep important data cached. In practice, however, situation is not that bad. Even for file copying, FAT and directory images are repeatedly refer- enced from cache data area so disk head movements, as well as repeated reads and writes to these area on disks are avoided, thus improving the speed of the copy operation. In the case of file copy, a rather small cache area works as well as large ones. QUESTION How Concache.exe Can Be Tuned, In Terms Of Conventional Mem- ory ? ANSWER An inevitable penalty of concurrency is memory requirements. Each concurrently driven device needs its own io buffer, control and stack space to switch to and fro, request packet to organize io, and, for ccdisk.exe, SCSI control block, in addition to descriptors needed for drives managed by con- cache.exe. Following is the description to save memory space used up by concache.exe. First, you can load concache.exe into upper memory, either through config.sys as a device driver or through autoexec.bat as a TSR (terminate and stay resident). Second, io buffer size can be changed by buffer_size= option, which can slow down data transfers. Note the size Concache 1.10 Last Update: 19 June 1996 5 EQANDA COPYRIGHT 1995-1996 horio shoichi EQANDA must be at least the size of the largest sector to be cached. Third, the number of io buffers can be changed. This change can affect io performance done by concache.exe so experi- ments are needed. Fourth, directory space can be made at a minimum to the con- currency you want. Fifth, if full stack space, currently 440 - 500 bytes, is not used, then it can be reduced to bare minimum 320 bytes provided no SCSI disks are used. However, this may be affected by the other external interrupt devices so experi- ments may be needed. (After all, under DOS, the proof of the stack is in the eating.) Finally, on ccdisk.exe command line, concurrency require- ments can be reduced down to somewhere bare minimum. If unfortunately concurrency mode cannot be used, then saying "concurrency=1" would save hundreds of bytes. QUESTION How Concache.exe Can Be Tuned, In Terms Of Performance ? ANSWER Speeding up is gained by either letting io efficient or by taking maximal concurrency. First, make tick_delay= value larger, to avoid clash between DOS and concache.exe write back actions. This goes with almost no penalties. Second, make io buffer size or number of io buffers larger. Options for these two factors work almost synonymously, since concache.exe doesn't do io in fixed size buffers. This will improve each io time and, if number of buffers is sufficiently large, will also allow concurrent actions. Third, as cache data area is split into multiple units of 8kb, which is fairly large compared to cluster size many people prefer, if the drives are heavily fragmented, then a large amount of space can be wasted in cache area. Note drive fragmentation is not the least influential on perfor- mance, and this is not particular to concache.exe but to all disk cache programs that work on FAT oriented file systems. Fourth, splitting files into disks in a scheme io overlap- ping is possible would avoid the io clashes. Concache 1.10 Last Update: 19 June 1996 6 EQANDA COPYRIGHT 1995-1996 horio shoichi EQANDA Fifth, although preread improves performance in most cases, it can degrade overall performance in certain cases; if read pattern is random then preread is not only useless but also further slows down by access clashes. If such files are frequently accessed, it might be better move them to a par- tition that does not preread. If cache data area is of marginal size then preread can purge still useful data from there and instead read out yet unnecessary data. QUESTION Is There Anything To Note With Relation To Serial Communica- tions Software ? ANSWER Serial communications are notorious for their severe timing requirements. For example, when communication speed is 38.4kb and the communication device is a model that lacks buffer, then each character received through it must be han- dled within 25 microsecond. Failing to handle the received character within the interval would result in overrun error familiar to programmers. Note this problem is particular to receive side; a few delays on send side usually make no severe problems. On the other hand since concache.exe works asynchronously with serial io, disk io is initiated and completed concur- rently with character transmissions. This means con- cache.exe causes various housekeepings in DOS context to be performed within the short interval, which is almost impos- sible on most PCs other than recent high performance ones. Alleviations do exist, fortunately. Following lists several of possible ways. write after mode This is to avoid overlapping operations with serial transfers, thus the severe timing problem disappears. buffered controller If controller used for serial communication has receive buffer it allows extend the short interval several times longer. For example, using NS16550 chip enables, when properly programmed, lengthen the interval 16 times. hardware flow control If this is possible on your PC and the counterpart, this prevents receiving when there is no room to do so, Concache 1.10 Last Update: 19 June 1996 7 EQANDA COPYRIGHT 1995-1996 horio shoichi EQANDA thus the short interval is (unlimitedly ?) extended. Troubleshooting In the following, common conflicts such as irq, dma, memory, SCSI option settings are not discussed. They are treated in respective manufacturer's manual, and (probably) not partic- ular to concache.exe per se. First, stack issue must be tried, as this causes most obscure effects on the workings of DOS programs. Concache.exe is designed to work in the environment stacks=0,0. However, because of variety of BIOS manufactur- ers and existence of so many BIOS versions, it is not cer- tain the estimate on concache's own stack requirements is enough in every environment it encounters. In addition, there may exist programs which expect a large stack space is available at any time. For testing purpose, first try "extremely wasteful" stack space in config.sys. If this solves problem, your remaining task is find out the best values for the config.sys line. Alternately, stacksize= option on concache.exe can be tried to find if concache.exe is experiencing stack overflow. Let's discuss the problem in each mode of concache.exe. Respective mode is to be given by option or by drive description. Fail On Stop Mode If concache.exe fails in stop mode, there are two cases to consider. CPU overhead concache.exe incurs can be the problem. See the section on the relations to communication. There is no gen- eral solutions whatsoever. The conflict can be between third party device drivers or hardware. The gnaw_interrupt option on concache.exe may help in some cases. Write Through Mode Doesn't Work Added complexity from stop mode to write through mode is the actual access to memory manager and device driver. Empirically, conflicts with memory managers are very rare, except for pre-'90 EMS managers. Concache 1.10 Last Update: 19 June 1996 8 EQANDA COPYRIGHT 1995-1996 horio shoichi EQANDA Some device drivers may not be prepared with recent device driver conventions. Write After Mode Doesn't Work Concurrency problems start from this mode. A variety of assumptions about single-taskness of DOS programs where io actions are enclosed within DOS context begin to cause con- flicts. Interrupt intensive applications can fail due to switching overhead caused by concache.exe. If this might be the case, then try write through mode. Slowing down is far better than losing data. Concurrency Mode Fails If write after mode works but concurrency mode doesn't, it seems most of problems are of synchronizations. One of cases encountered while testing compatibilities are due to improper int2a8x handling. For example a network program ignores int2a8x critical sec- tion interrupts while within int13 period, exactly which is what concache.exe is going to do. Consequently, the program miscounts int2a8x and erroneously identifies DOS idle period. Another example. There are certain periods concache.exe does not want to be interrupted and reentered. In such cases it issues DOS synchronization interrupt and warns not to call DOS. Unfortunately, the interrupt is ignored or ill- treated, thus causing hang. SEE ALSO ccdisk.txt, concache.txt, floppies.txt, overview.txt. Concache 1.10 Last Update: 19 June 1996 9