EQANDA          COPYRIGHT 1995-1996 horio shoichi          EQANDA



NAME
     eqanda.txt - Expected Questions And Answers


CONTENTS
     This section contains the questions probably raised in using
     concache.exe and the family programs,  the  DOS  disk  cache
     program,  and  their  answers.  Following is the contents of
     this section.

          Why And How Cache Programs Speed Up Disk Io ?
          What Are The Elements To Limit Concurrency ?
          How Much Memory Should Be Prepared For Cache ?
          How Concache.exe Can Be Tuned,  In  Terms  Of   Conven-
          tional Memory ?
          Is  There Anything To Note With Relation To Serial Com-
          munications Software ?
          Troubleshooting


QUESTION
     Why And How Cache Programs Speed Up Disk Io ?

ANSWER
     Actually, disk  cache  programs  don't  speed  up  disk  io.
     Instead, they reduce the number of disk io operations.  They
     work to the user program as if disk io is completed as  soon
     as  possible.   They buffer disk data in a large memory area
     called disk cache buffer (hereafter  simply  termed  cache).
     For  read  requests,  if  the  data to be read reside in the
     cache, data is supplied from cache.  Also, data to  be  read
     next by user programs are read and stored in the cache. This
     method of speed up is called "read ahead" or "preread".  For
     write  requests,  the  data to be written is copied into the
     cache and user programs "think" the data to be written   are
     really  written  to  disks. The data are actually written at
     the cache program's convenience. This method of speeding  up
     the  write requests is called "delay write", "write behind",
     "write after", or "postwrite".

     First generation of PC cache programs were generally  reluc-
     tant  to  use postwrite. This is thought of as a too special
     luxury.  Data to be written are written to disks as soon  as
     requested.   The  method to handle writes this way is called
     "write-through".

     When cache programs arrived on the market  which  use  post-
     write,  it  is found the programs more than double the speed
     of writes. This is because disk allocation table,  known  as
     FAT,  is  located  at  the top of disk and data space at the

Concache 1.10       Last Update:  19 June 1996                  1



EQANDA          COPYRIGHT 1995-1996 horio shoichi          EQANDA



     opposite corner, every write request first writes FAT  mark-
     ing  as  used  and then turns head to the allocated area and
     write data sectors.  Postwrite in effect eliminates repeated
     writes  on  FAT  by submitting to DOS yet unwritten FAT. So,
     not only actual number of write operations are  reduced  but
     most  head  movements are eliminated by not needing to actu-
     ally go back and forth to FAT area.

     When working on floppy, you might  have  experienced  severe
     performance  degradation if buffers= statement in config.sys
     file is inadequately written.  Also you might have  observed
     writes  get  slow  down as your program proceed.  What cache
     programs do, up to this generation, is to  extend  the  con-
     fig.sys statement buffers= to a large cache buffer.

     Next  come so called "advanced" cache programs which attempt
     to write data back concurrently with  user  programs.  These
     cache  programs  don't wait keyboard idle time, for example,
     to write back cached data.  This means traditional DOS  pro-
     grams'  common  inception  that because disk writes are slow
     they must be held into application program's  buffers  until
     absolute  needs  arise to write them back is wrong.  Writing
     data as required is in fact faster and, perhaps less  impor-
     tant, eliminates the need of huge buffers from each applica-
     tion program.  In addition, because data are written as they
     are  produced,  there  are  less  chances of accidental data
     loss.  They become faster, safer and leaner.

     It might be possible to think disk speed up has taken  place
     beginning with this generation.

     Concache.exe  belongs  to  this  generation,  and  has added
     another generality. It allows concurrency as far as there is
     no  reason  to  refrain  from. The result is one floppy, one
     BIOS disk, and as many as SCSI disks configurable  into  DOS
     can be driven concurrently with DOS/user programs.


QUESTION
     What Are The Elements To Limit Concurrency ?

ANSWER
     From  hardware  point  of  view, floppies can not perform io
     concurrently each other due  to  floppy  controller  design.
     Also, IDE disks cannot.  SCSI disks can perform io in paral-
     lel, as seen on many multiprogramming operating systems.  At
     this  level,  one  floppy,  one  IDE disk and SCSI disks can
     operate concurrently.



Concache 1.10       Last Update:  19 June 1996                  2



EQANDA          COPYRIGHT 1995-1996 horio shoichi          EQANDA



     The next level to consider is BIOS to support io operations.
     As  far  as published BIOS listing is concerned, there is no
     reason floppy and  IDE  disk  cannot  operate  concurrently.
     SCSI drivers are usually written to do io asynchronously.

     Here comes BIOS capability to distinguish disk events. Stan-
     dard BIOS handle only two "type"s of disks, which is  suffi-
     cient  for  floppies  and IDE disk environments, as found in
     most PC configurations.  Fortunately,  ASPI  (advanced  SCSI
     programming  interface) specification, now broadly employed,
     supports a mechanism effectively similar to BIOS disk  event
     notification,  called command posting. (See appropriate man-
     ual about  this.)   This  allows  handle  individual  disk's
     events.

     At  this  level  no  situations  about  concurrency issue is
     changed.

     The next  level  of  the  factor  is  device  driver's  non-
     reentrancy.  Even  if a device driver manages several disks,
     it expects its requests come serially but not while the pre-
     vious  requests are in  progress. In fact, most known device
     drivers lose reentrancy necessary  for  concurrency  at  the
     very first two steps of driver code execution.

     Also,  io.sys  handles  int13,  which  is  passed through by
     almost any disk device call, in non-reentrant way.  So,  you
     may  think if third party device driver is used, for example
     using io.sys for  floppies  and  that  for  the  other  disk
     devices, then at least the combination of one floppy and one
     hard disk should work concurrently.  But no. If  both  share
     int13, then they don't work concurrently.

     Next  comes the DOS drive letter availability. If, for exam-
     ple, a SCSI disk is split into  two  partitions,  with  many
     good  reasons, the user loses one drive letter for one disk.
     These two partitions cannot share the io operation time.

     Those constitute inherent limitations  of  concurrency.   In
     practice,  there are resource limitations for programs under
     DOS.  For example ASPI drivers may limit the number of pack-
     ets that it can accept at once.

     Likewise, ccdisk.exe can limit the concurrency of SCSI disks
     from its command line.

     Finally, concache.exe can limit concurrency in two ways.

     1)   concurrency= option limits  the  number  of  concurrent
          devices.

Concache 1.10       Last Update:  19 June 1996                  3



EQANDA          COPYRIGHT 1995-1996 horio shoichi          EQANDA



     2)   io_buffers= option specifies insufficient io buffers to
          let devices work concurrently.


QUESTION
     How Much Memory Should Be Prepared For Cache ?

ANSWER
     There are certainly optimal points of cache  size.  Unfortu-
     nately,  the points are too dependent on application and the
     mix.  There is no clear way to estimate the size and perfor-
     mance of cache.

     Fortunately,  concache.exe  allows  change cache size on the
     fly. You can observe the performance of various cache sizes.
     If  adding  memory  doesn't  improve, then probably your mix
     needs more memory, or you decide decrease cache memory  size
     without degrading performance.

     A  "pathetic"  looking example is presented below. This kind
     of anomaly is not uncommon in practice.

     Consider following hypothetical example.  I  edit,  compile,
     link,  and  debug  programs, just cyclically repeating these
     steps.  For simplicity, assume each  step  requires  exactly
     one  megabyte.   And  assume  each step needs a set of files
     completely unrelated to the other steps (unrealistic  ?  but
     think  simple this way for now.)  Now let's have 3 megabytes
     cache. Then how this 3 mb will be used ?

     Each of first three steps loads editor and source files into
     first  megabyte,  loads compiler, header, source, and object
     files into the next megabyte, finally loads loader, library,
     object and exe files into the last megabyte.

     The  fourth  step  finds no free megabyte. So it must select
     one from among three. Now familiar algorithm is in its turn.
     Since  the content of first megabyte is least recently used,
     it is considered unlikely to be  used  very  soon.   So  the
     algorithm loads exe file, debugger, test data into, you see,
     into the first megabyte.

     I go back to editor. It is not in the first megabyte as  you
     have  just  witnessed.   The editor etc. must be loaded into
     second megabyte under similar fuss.  This  will  purge  com-
     piler and so on from second megabyte. ...

     In  this  example  cache  performance is no better than if I
     used only one megabyte cache.  If I added  another  megabyte
     then  the  performance  will be jump improved but adding the

Concache 1.10       Last Update:  19 June 1996                  4



EQANDA          COPYRIGHT 1995-1996 horio shoichi          EQANDA



     more does no good.  If your job mix consists of  five  mutu-
     ally  unrelated  steps each requiring one megabyte and cache
     size is four megabytes, then four megabyte space is no  bet-
     ter than one megabyte.

     This  extremity comes out of commonly used LRU algorithm and
     extremely simplistic assumptions of usage pattern. The least
     recently  used  space  is unlikely to be used very soon, but
     actually it is in this case.  So, to pick up a victim out of
     already  used  three  megabytes,  let us select it randomly.
     The probability of the survival of the next needed  megabyte
     is  0.67, and cache performance is improved that much, isn't
     it ?

     A similar situation is when copying a large file. Never read
     again and never written again records continually flows into
     cache data area, thereby erasing useful data from there. So,
     more  than  double  the file size cache area is necessary to
     keep important data cached.

     In practice, however, situation is not that  bad.  Even  for
     file copying, FAT and directory images are repeatedly refer-
     enced from cache data area so disk head movements,  as  well
     as  repeated  reads  and  writes  to these area on disks are
     avoided, thus improving the speed of the copy operation.  In
     the  case  of  file copy, a rather small cache area works as
     well as large ones.


QUESTION
     How Concache.exe Can Be Tuned, In Terms Of Conventional Mem-
     ory ?

ANSWER
     An inevitable penalty of concurrency is memory requirements.
     Each concurrently driven device needs  its  own  io  buffer,
     control and stack space to switch to and fro, request packet
     to organize io, and, for ccdisk.exe, SCSI control block,  in
     addition  to  descriptors  needed for drives managed by con-
     cache.exe.

     Following is the description to save memory space used up by
     concache.exe.

     First,  you  can load concache.exe into upper memory, either
     through  config.sys  as   a   device   driver   or   through
     autoexec.bat as a TSR (terminate and stay resident).

     Second,  io  buffer  size  can  be  changed  by buffer_size=
     option, which can slow down data transfers.  Note  the  size

Concache 1.10       Last Update:  19 June 1996                  5



EQANDA          COPYRIGHT 1995-1996 horio shoichi          EQANDA



     must  be  at  least  the  size  of  the largest sector to be
     cached.

     Third, the number of io buffers can be changed. This  change
     can  affect  io  performance done by concache.exe so experi-
     ments are needed.

     Fourth, directory space can be made at a minimum to the con-
     currency you want.

     Fifth,  if  full  stack space, currently 440 - 500 bytes, is
     not used, then it can be reduced to bare minimum  320  bytes
     provided  no  SCSI  disks  are  used.  However,  this may be
     affected by the other external interrupt devices so  experi-
     ments  may  be  needed.  (After all, under DOS, the proof of
     the stack is in the eating.)

     Finally, on ccdisk.exe command  line,  concurrency  require-
     ments  can  be  reduced  down  to somewhere bare minimum. If
     unfortunately concurrency mode cannot be used,  then  saying
     "concurrency=1" would save hundreds of bytes.


QUESTION
     How Concache.exe Can Be Tuned, In Terms Of Performance ?

ANSWER
     Speeding  up  is gained by either letting io efficient or by
     taking maximal concurrency.

     First, make tick_delay= value larger, to avoid clash between
     DOS  and  concache.exe  write  back actions.  This goes with
     almost no penalties.

     Second, make io buffer size or number of io buffers  larger.
     Options  for  these  two  factors  work almost synonymously,
     since concache.exe doesn't do  io  in  fixed  size  buffers.
     This  will improve each io time and, if number of buffers is
     sufficiently large, will also allow concurrent actions.

     Third, as cache data area is split into  multiple  units  of
     8kb,  which  is  fairly  large compared to cluster size many
     people prefer, if the drives are heavily fragmented, then  a
     large  amount  of  space  can be wasted in cache area.  Note
     drive fragmentation is not the least influential on  perfor-
     mance, and this is not particular to concache.exe but to all
     disk cache programs that work on FAT oriented file  systems.

     Fourth,  splitting  files into disks in a scheme io overlap-
     ping is possible would avoid the io clashes.

Concache 1.10       Last Update:  19 June 1996                  6



EQANDA          COPYRIGHT 1995-1996 horio shoichi          EQANDA



     Fifth, although preread improves performance in most  cases,
     it can degrade overall performance in certain cases; if read
     pattern is random then preread is not only useless but  also
     further  slows  down  by  access clashes.  If such files are
     frequently accessed, it might be better move them to a  par-
     tition  that  does  not  preread.   If cache data area is of
     marginal size then preread can purge still useful data  from
     there and instead read out yet unnecessary data.


QUESTION
     Is There Anything To Note With Relation To Serial Communica-
     tions Software ?

ANSWER
     Serial communications are notorious for their severe  timing
     requirements.   For  example,  when  communication  speed is
     38.4kb and the communication device is a  model  that  lacks
     buffer, then each character received through it must be han-
     dled within 25 microsecond.  Failing to handle the  received
     character  within the interval would result in overrun error
     familiar to programmers.  Note this problem is particular to
     receive  side;  a  few  delays  on send side usually make no
     severe problems.

     On the other hand since  concache.exe  works  asynchronously
     with  serial  io, disk io is initiated and completed concur-
     rently  with  character  transmissions.   This  means   con-
     cache.exe  causes various housekeepings in DOS context to be
     performed within the short interval, which is almost  impos-
     sible on most PCs other than recent high performance ones.

     Alleviations  do exist, fortunately. Following lists several
     of possible ways.


     write after mode
          This is to avoid  overlapping  operations  with  serial
          transfers, thus the severe timing problem disappears.

     buffered controller
          If controller used for serial communication has receive
          buffer it allows  extend  the  short  interval  several
          times longer.  For example, using NS16550 chip enables,
          when properly  programmed,  lengthen  the  interval  16
          times.

     hardware flow control
          If  this  is  possible  on your PC and the counterpart,
          this prevents receiving when there is no room to do so,

Concache 1.10       Last Update:  19 June 1996                  7



EQANDA          COPYRIGHT 1995-1996 horio shoichi          EQANDA



          thus the short interval is (unlimitedly ?) extended.


Troubleshooting
     In the following, common conflicts such as irq, dma, memory,
     SCSI option settings are not discussed. They are treated  in
     respective manufacturer's manual, and (probably) not partic-
     ular to concache.exe per se.

     First, stack issue  must  be  tried,  as  this  causes  most
     obscure effects on the workings of DOS programs.

     Concache.exe   is   designed  to  work  in  the  environment
     stacks=0,0.  However, because of variety of BIOS manufactur-
     ers  and  existence of so many BIOS versions, it is not cer-
     tain the estimate on concache's own  stack  requirements  is
     enough  in  every  environment  it encounters.  In addition,
     there may exist programs which expect a large stack space is
     available  at  any  time.   For  testing  purpose, first try
     "extremely wasteful" stack space  in  config.sys.   If  this
     solves  problem,  your  remaining  task is find out the best
     values for the config.sys line.

     Alternately, stacksize= option on concache.exe can be  tried
     to find if concache.exe is experiencing stack overflow.

     Let's  discuss  the  problem  in  each mode of concache.exe.
     Respective mode is  to  be  given  by  option  or  by  drive
     description.


  Fail On Stop Mode
     If  concache.exe  fails in stop mode, there are two cases to
     consider.

     CPU overhead concache.exe incurs can be the problem. See the
     section on the relations to communication.  There is no gen-
     eral solutions whatsoever.

     The conflict can be between third party  device  drivers  or
     hardware.   The  gnaw_interrupt  option  on concache.exe may
     help in some cases.


  Write Through Mode Doesn't Work
     Added complexity from stop mode to write through mode is the
     actual access to memory manager and device driver.

     Empirically,  conflicts  with memory managers are very rare,
     except for pre-'90 EMS managers.

Concache 1.10       Last Update:  19 June 1996                  8



EQANDA          COPYRIGHT 1995-1996 horio shoichi          EQANDA



     Some device drivers may not be prepared with  recent  device
     driver conventions.


  Write After Mode Doesn't Work
     Concurrency  problems  start  from  this  mode. A variety of
     assumptions about single-taskness of DOS programs  where  io
     actions  are enclosed within DOS context begin to cause con-
     flicts.

     Interrupt intensive applications can fail due  to  switching
     overhead caused by concache.exe.  If this might be the case,
     then try write through mode. Slowing down is far better than
     losing data.

  Concurrency Mode Fails
     If  write  after mode works but concurrency mode doesn't, it
     seems most of problems  are  of  synchronizations.   One  of
     cases  encountered  while testing compatibilities are due to
     improper int2a8x handling.

     For example a network program ignores int2a8x critical  sec-
     tion  interrupts while within int13 period, exactly which is
     what concache.exe is going to do. Consequently, the  program
     miscounts   int2a8x  and  erroneously  identifies  DOS  idle
     period.

     Another example.  There  are  certain  periods  concache.exe
     does not want to be interrupted and reentered. In such cases
     it issues DOS synchronization interrupt  and  warns  not  to
     call  DOS.   Unfortunately, the interrupt is ignored or ill-
     treated, thus causing hang.


SEE ALSO
     ccdisk.txt, concache.txt, floppies.txt, overview.txt.















Concache 1.10       Last Update:  19 June 1996                  9