Maximizing Performance with dBase II and III

                                  by Alan Simpson

                         Reprinted from "Personal Systems",
                   The Journal of the San Diego Computer Society
               Issues of November and December 1984 and January 1985
                 Typed and Formatted by Rick Yount, Oak Harbor, WA
                                   (206) 675-9797

           This article may be reprinted for any non-commercial purpose;
                  please credit the Author and "Personal Systems".

                                 TABLE OF CONTENTS:

                   Trimming Minutes Down to Seconds ........... 1
                   Don't Resort to Re-Sorting ................. 2
                   Faster Sorts ............................... 4
                   Faster Searching ........................... 4
                   Faster Math ................................ 5
                   Faster Reports ............................. 6
                   Faster Copying ............................. 6
                   Faster Edits ............................... 7
                   Searching for Ranges ....................... 8
                   Estimating Performance ..................... 8
                   Managing Multiple Index Files .............. 9
                   Trade-Offs ................................ 10
                   Technical Rationale ....................... 12


              Everyone wants the most out of their computer.  Granted, it's
         nice that the computer can trim down to minutes what usually
         requires humans hours to perform.  But then, nobody complains if
         those computer minutes can be trimmed down to seconds.  In this
         paper we'll discuss and compare general techniques for maximizing
         the performance of dBase.  We'll trim some of those long dBase
         processing minutes down to quick dBase seconds.


         TRIMMING MINUTES DOWN TO SECONDS

              We'll begin by benchmarking (comparing) a few techniques for
         performing some basic tasks with dBase II.  We'll use a simple
         mailing list database as an example.  Its name is MAIL.DBF, and it
         has this structure:

              FLD          NAME            TYPE        WIDTH      DEC

              001          LNAME            C          020        000
              002          FNAME            C          020        000
              003          ADDRESS          C          020        000
              004          CITY             C          020        000
              005          STATE            C          010        000
              006          ZIP              C          010        000
              007          AMOUNT           N          009        002
         First, let's discuss various methods for sorting this database:

          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                       Page 1







         DON'T RESORT TO RE-SORTING

              There are basically four methods to maintain a sorted list of
         items in a database.  The first is to simply CREATE the MAIL
         database, add records to it, then sort it with the SORT command.
         For example, suppose you CREATE MAIL and APPEND 999 records to it.
         To add another record to it, and maintain an alphabetical listing
         by last name, you would need to use the commands:

                   USE MAIL
                   APPEND
                   SORT ON LNAME TO TEMP

              This procedure would require about 940 seconds to re-sort a
         database with 1000 records in it.

              Another method would be to USE MAIL, APPEND the new record,
         then INDEX the file on the LNAME field, using the commands below:

                   USE MAIL
                   APPEND
                   INDEX ON LNAME TO NAMES

              This procedure requires about 530 seconds to re-sort the
         1000-record database back into last name order

              A third method would be to locate the position in the
         database that the new record belongs, and INSERT a new record into
         its proper alphabetical place, as below:

                   USE MAIL
                   LIST (to find the insertion point)
                   INSERT

              The time required would be however long it takes you to find
         the appropriate place to insert the new record, plus about 124
         seconds for the INSERT command to rearrange the database.

              The fourth method is to create an index file of the field to
         sort on, and keep that index file active while adding the new
         record(s).  In most cases, the best time to index a file is
         immediately after creating it, as in the commands below:

                   CREATE MAIL
                   USE MAIL
                   INDEX ON LNAME TO NAMES

              It only takes about two seconds to create the index file when
         the database is empty.  At this point, the MAIL.DBF database and
         the LNAME.NDX files exist on disk, but both are empty.  To add new
         data, you would need to make the NAMES.NDX file active by typing
         in the commands:




          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                       Page 2






                   USE MAIL INDEX NAMES

              If you always USE MAIL INDEX NAMES when you add new data, the
         index file will automatically be updated, therefore the data will
         always be sorted.  So to add one record to the MAIL database with
         999 records in it, you would type the commands:

                   USE MAIL INDEX NAMES
                   APPEND

              There is no need to go through the INDEX ON procedure again,
         because the NAMES.NDX file was active during the appending
         procedure.  The time required to automatically re-sort the index
         file using this method is a scant 3 seconds.  Add to that the
         original two seconds for the INDEX ON command to create the
         initial index file for a total of 5 seconds.  Table 1 compares the
         processing times for the four methods (using a 16-bit computer
         with 256K RAM, floppies, and dBase II Version 2.4):

         -------------------------------------------------------
         Method       Commands Used                Time Required

         1            USE, APPEND and SORT         940 seconds
         2            USE, APPEND and INDEX ON     530 seconds
         3            USE, INSERT                  129 seconds
         4            USE file1 INDEX file2 APPEND   5 seconds
         -------------------------------------------------------
         Table 1:  Sorting Times; Four Different Approaches

              Remember, you need to first create an index file based upon
         the field(s) you wish the database to be sorted by.  Use the INDEX
         ON command to create the index (.NDX) file.  For example, to
         maintain an alphabetical listing of people on the MAIL database by
         last name, type in the commands:

                   USE MAIL
                   INDEX ON LNAME TO NAMES

              This creates and stores an index file called NAMES.NDX on
         disk.  To make the index file active, specify its name in the USE
         command, as below:

                   USE MAIL INDEX NAMES

              Once the file is made active in this way, any changes to the
         database, whether they be through APPEND, EDIT, BROWSE, REPLACE,
         PACK or READ will automatically re-sort and adjust the index file
         accordingly.

              Avoiding the re-sorting process on only one way in which
         index files can greatly improve the speed of a dBase II software
         system. Most types of processing that involve searching can also
         be accelerated.




          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                       Page 3






              FASTER SORTS

              In some cases, a database should be physically sorted rather
         than indexed.  For example, the UPDATE command works best if the
         file you are updating FROM is physically pre-sorted.  You could
         use the SORT command to create a sorted database called TEMP from
         the MAIL database using the commands:

                   USE MAIL
                   SORT ON LNAME TO TEMP

              This approach takes about 940 seconds, or about 15 minutes of
         processing time.  If the NAMES index file already exists, you can
         achieve the same result using the commands:

                   USE MAIL INDEX NAMES
                   COPY TO TEMP

              Copying the contents of the indexed file only required about
         326 seconds of processing time; about one-third the time.  When
         you COPY an indexed file to another database, the records on the
         database you copy to will be physically sorted.


         FASTER SEARCHING

              Let's assume that the MAIL.DBF database already has 1,000
         records on it.  Ten individuals on this database have the last
         name (LNAME field) "Miller".   The question is:  Just how long
         does it take to LIST, COUNT, or COPY all the "Miller"'s to another
         database; or how long does it take to print a formatted REPORT
         with only "Miller"'s, or to SUM the amounts for the "Miller"'s?
         The answer, of course, is:  It depends on how you do it.

              For our benchmark comparisons, we'll assume that the database
         has already been indexed on the LNAME field, using the commands:

                   USE MAIL
                   INDEX ON LNAME TO NAMES

              Let's begin by comparing processing times using two different
         command files and approaches.  The first method will use the
         standard LIST FOR approach to fish out all the "Miller"'s.  The
         command file looks like this:

                   ********** Method 1:  LIST FOR approach

                   ERASE
                   USE MAIL INDEX NAMES
                   ACCEPT " List all with what last name? " to SEARCH
                   LIST FOR LNAME=SEARCH

              When you DO this command file, it clears the screen and
         displays the prompt:

                   List all with what last name?

          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                       Page 4







              Suppose you type in the name "Miller" and press the RETURN
         key.  The program will then display the records of all ten
         "Miller"'s on the screen.  The time required for the command file
         to display the "Miller"'s and return to the dot prompt is about
         148 seconds on a floppy disk system.  Almost two and a half
         minutes.

              A second approach to solve this problem is to use the FIND
         command to look up the first "Miller" in the NAMES index, then use
         the WHILE option to display the remaining "Miller"'s in the
         database, as in the program below:

                   ********** Method 2:  FIND and LIST WHILE approach

                   ERASE
                   ACCEPT " List all with what last name? " to SEARCH
                   FIND &SEARCH
                   LIST WHILE LNAME = SEARCH

              Processing time for the second method to display all 10
         "Miller"'s then redisplay the dot prompt on the screen, is less
         than 9 seconds.  Table 2 compares processing times for the two
         different methods.  Both methods perform exactly the same task,
         but the processing times vary dramatically.

              -------------------------------------------------------
              Method       Commands Used                Time Required

              1            LIST FOR                     148.75 seconds
              2            FIND and LIST WHILE            8.94 seconds
              --------------------------------------------------------
              Table 2:  Comparison of Processing Times; Two Methods


         FASTER MATH

              The FIND and WHILE approach with indexed databases can offer
         significant time savings with dBase commands other than LIST.  For
         example, if you want dBase to COUNT how many "Miller"'s are in the
         database, you can use the commands:

                   USE MAIL
                   COUNT FOR LNAME = "Miller"

              This approach requires about 55.5 seconds to display the fact
         that there are 10 "Miller"'s in the database, then redisplay the
         dot prompt.  You can cut this time down significantly using the
         commands:

                   USE MAIL INDEX NAMES
                   FIND Miller
                   COUNT WHILE LNAME = "Miller"

              This approach performs the same task in about 3.20 seconds;
         quite a significant time savings.

          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                       Page 5







              Suppose you wish to SUM the AMOUNT field for just the
         "Miller"'s?  You could use these commands:

                   USE MAIL
                   SUM AMOUNT FOR LNAME = "Miller"

              Processing time using this method is about 58 seconds.  The
         alternative approach uses these commands:

                   USE MAIL INDEX NAMES
                   FIND "Miller"
                   SUM AMOUNT WHILE LNAME = "Miller"

              This approach takes 5.28 seconds; less than one-tenth the
         time.


         FASTER REPORTS

              You can use the WHILE command with the REPORT command, also.
         For example, suppose you've already created a formatted report
         called MAILIST using the REPORT command.  To display all the
         "Miller"'s on the formatted report, you could use the commands:

                   USE MAIL
                   REPORT FORM MAILIST FOR LNAME = "Miller"

              This approach requires about 135.3 seconds to display all the
         "Miller"'s on the report, then redisplay the dot prompt.  The
         faster approach uses these commands:

                   USE MAIL INDEX NAMES
                   FIND Miller
                   REPORT FORM MAILIST WHILE LNAME = "Miller"

              These commands perform the same job in a slim 14.03 seconds.


         FASTER COPYING

              For copying portions of the MAIL database to a database
         called TEMP, the commands:

                   USE MAIL INDEX NAMES
                   COPY TO TEMP FOR LNAME = "Miller"

         require a hefty 200.6 seconds; more than three minutes.  You can
         perform the same job using these commands:

                   USE MAIL INDEX NAMES
                   FIND Miller
                   COPY TO TEMP WHILE LNAME = "Miller"

              These commands trim the copying time down to a comfortable
         18.7 seconds.

          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                       Page 6






         FASTER EDITS

              Indexed files can also speed up the editing process.  For
         example, suppose you want to BROWSE through the database to edit
         data for one of the "Miller"'s.  One approach would be to type in
         the commands:

                   USE MAIL
                   BROWSE

              These commands will set up the dBase BROWSE screen with the
         names and addresses in their original order, as displayed in
         Figure 1:


         LNAME--------------------FNAME------------------ADDRESS
         Bond                     James                  007 Spy St.
         Kenney                   Dave                   123 Clark St.
         Newell                   Jeff                   341 Lou Drive
         Mohr                     Richard                350 W. Leadora St.
         Rosiello                 Rick                   999 Buddy Way
         Wallace                  Doug                   345 Killer St.
         Miller                   Mike                   601 Lemon Dr.

         Figure 1:  An unindexed BROWSE screen.

              You will need to use lots of CTRL-C commands to scroll
         through the database to find the "Miller" you're looking for.
         There's no telling how long it might take you to find the
         particular "Miller" you wish to edit, because the "Miller"'s will
         be placed randomly throughout the database.

              If, on the other hand, you use these commands to BROWSE:

                   USE MAIL INDEX NAMES
                   FIND Miller
                   BROWSE

              The BROWSE screen will display the first "Miller" on the
         database, and all the remaining "Miller"'s immediately beneath, as
         shown in Figure 2:


         LNAME--------------------FNAME------------------ADDRESS
         Miller                   Mollie                 601 Mission Blvd.
         Miller                   Shiela                 1234 Genessee
         Miller                   Ms. Stephanie S.       734 Rainbow Dr.
         Miller                   Patti                  626 Mazda Way
         Miller                   George                 P.O. Box 2802
         Miller                   Julie                  999 Love St.
         Miller                   Caron                  123 Princess Way
         Miller                   Ms. Chrissie           321 Hynde St.
         Miller                   Mrs. Sally S.          325 Seco Ct.
         Miller                   Dr. James T.           701 Newport Dr.

         Figure 2:  A BROWSE screen with an indexed database

          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                       Page 7






              No need to go scrolling through pages and pages of BROWSE
         screens to find the "Miller" you wish to edit, because all ten
         "Miller"'s are displayed immediately and simultaneously on one
         BROWSE screen.

              Similarly, if you wish to use the EDIT command to change the
         data for a particular "Miller", you can use the commands:

                   USE MAIL INDEX NAMES
                   FIND Miller
                   EDIT WHILE LNAME = "Miller"

              These commands will quickly display the first "Miller" in the
         database on the EDIT screen.  Each subsequent CTRL-C command will
         immediately position you to the next "Miller" in the database.  So
         once again, you can save a great deal of time by not having to
         scroll around and search for individual "Miller"'s.


         SEARCHING FOR RANGES

              The FIND and WHILE approach can also be used for searching
         for ranges of data.  For example, if a database had a DATE field
         with dates stored in MM/DD/YY format, and the database was indexed
         on the date field, the following command file would list all
         records between the requested starting and ending dates:

               USE file INDEX datefield
               STORE "        " TO START, FINISH
               @ 2,2 SAY "Enter Start Date " GET START PICTURE "99/99/99"
               @ 4,2 SAY "Enter End Date " GET FINISH PICTURE "99/99/99"
               READ

               FIND &START
               LIST WHILE DATE <= FINISH
               (This command file assumes that all dates have the same
               year.  It's a little trickier when the data is spread across
               several years


         ESTIMATING PERFORMANCE

              Performance can be an important issue in developing a custom
         dBase II software system.  Generally, processing times tend to
         increase linearly with the size of a database.  Therefore, on a
         database with 5,000 records in it, displaying all the "Miller"'s
         could take as long as 740 seconds, a little over 12 minutes, with
         the LIST FOR approach.  Using an indexed file with the FIND and
         LIST WHILE approach will perform the same task in about 45
         seconds, less than a minute.  Double those times for a database
         with 10,000 record in it:  A whopping 24 minutes (almost a half
         hour) with the FOR approach, vs. 88 seconds (under a minute and a
         half) with the FIND and WHILE approach.




          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                       Page 8







              Keep in mind that any dBase command that allows the FOR
         option will allow the WHILE option instead (e.g.  LIST, DISPLAY,
         COPY, REPORT, SUM, COUNT, REPLACE, DELETE, TOTAL).  Therefore, any
         of these processes can be greatly expedited with an index file and
         the FIND and WHILE commands.


         MANAGING MULTIPLE INDEX FILES

         In the previous examples, we compared processing times using an
         index file called NAMES.  This index file contains only the LNAME
         field.  Realistically, a mailing list will probably require two
         separate sort orders;  1) a sort by last name and first name for
         printing a directory listing, and 2) a sort by zip code for bulk
         mailings.  In that case, you need to create two index files to
         manage the two different sort orders.  One index file, which we'll
         call NAMES, will keep the mailing list data in last name and first
         name order.  A second index file, which we'll call ZIPS, will
         maintain a zip code order for handling bulk mailings.  To create
         both of these index files, you'll need to first CREATE the
         MAIL.DBF database with the CREATE command, then immediately create
         the two index files using the commands:

                   USE MAIL
                   INDEX ON LNAME + FNAME TO NAMES
                   INDEX ON ZIP TO ZIPS

              The MAIL.DBF database now has two index files associated with
         it; NAMES.NDX and ZIPS.NDX.  You can keep both index files active
         by using the command:

                   USE MAIL INDEX NAMES,ZIPS

              By specifying two index files in this fashion, all future
         modifications to the database with the APPEND, EDIT, BROWSE, READ,
         or REPLACE commands will  * *  AUTOMATICALLY  * *  update both
         index files.

              If you LIST or DISPLAY ALL the records in the database, they
         will be displayed in last name order, since NAMES is the first-
         listed index file in the INDEX portion of the command line.

              Furthermore, you can only use the FIND command with the first
         listed index, NAMES.  Therefore, to display all the records in the
         database in zip code order, you need not go through the INDEX ON
         ZIP TO ZIPS procedure again.  Instead, you can simply type in the
         command:

                   SET INDEX TO ZIPS

              This eliminates the time required to sort the file again.
         Before adding new records or editing the database, be sure to use
         the commands:



          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                       Page 9






                   USE MAIL INDEX NAMES,ZIPS
                   or
                   USE MAIL INDEX ZIPS,NAMES

              This is required to make both index files active again.
         Otherwise, you'll be likely to get a RECORD OUT OF RANGE error
         sometime in the not too distant future.


         TRADE-OFFS

              There are some trade-offs to contend with when using multiple
         index files.  Generally, the more index files you have active at
         any given moment, the longer it takes to perform an APPEND, EDIT
         or REPLACE procedure.  For example, if you want to add records to
         the MAIL database without any active index files, you can just use
         the commands:

                   USE MAIL
                   APPEND

              As you type in each new individual's data, the screen will
         immediately accept each new record and re-display the next APPEND
         screen.  However, if you decide to get carried away and create
         four index files, and keep them all active as:

                   USE MAIL INDEX NAMES, ZIPS, CITIES, STATE

         You will notice a definite delay between the time you fill one
         APPEND screen and the appearance of the next blank APPEND screen.
         On a large data file with over 1,000 records in it, the delay
         could be as much as 20 seconds, depending on how many fields are
         in each index file and the RAM capacity of your computer.

              In general, one or two active index files are sufficient for
         most databases.  The delays caused by one or two active index
         files are relatively insignificant, and are more than compensated
         for by the time savings that the FIND and WHILE commands offer, as
         well as by the time savings gained by avoiding re-indexing.

              Another disadvantage to indexed files occurs during global
         replaces.  For example, if for some reason you wished to set all
         the AMOUNT fields back to zero in your hypothetical MAIL database,
         you could use the commands:

                   USE MAIL INDEX NAMES,ZIPS
                   REPLACE ALL AMOUNT WITH 0

              On a database with 1000 records in it, this process could
         easily take 45 minutes.  However, the commands above waste
         processing time by updating the index files when it is not
         necessary to do so.





          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                      Page 10






              Recall that the NAMES index contains the LNAME and FNAME
         fields, and the ZIPS index contains the ZIP field.  The AMOUNT
         field is not used in either index file.  Therefore you can use the
         NOUPDATE option (supplied in dBase Versions 2.4 and higher) to
         perform the replace.  These commands:

                   USE MAIL INDEX NAMES,ZIPS
                   REPLACE NOUPDATE ALL AMOUNT WITH 0

         perform the update in about six minutes.  The NOUPDATE option
         informs dBase that, even though there are two active index files,
         there is no need to update them while performing this REPLACE
         command.

              There are some important points to keep in mind about the
         INDEX and FIND commands.  First, the FIND command only works on an
         indexed field.  If a database is in use with multiple index files,
         the FIND command only works with the first listed index file.  For
         example, if you open the MAIL data base with the two index files
         as below:

                   USE MAIL INDEX ZIPS, NAMES

         the FIND command can only be used to locate a zip code.

              If the data to look up in a database is stored in a variable,
         then the variable name must be "macro-ized" to be used with the
         FIND command, as below:

                   ACCEPT "Look up whom? " to SEARCH
                   FIND &SEARCH

              Also, FIND does not support functions or operators.  That is,
         you cannot use the commands:  FIND Miller .OR. Smith or FIND
         !(&SEARCH) nor FIND LNAME > &SEARCH.

              If you create two index files, but later add, edit, or delete
         data with only one or neither of the index files active, the index
         files will be corrupted, and you will most likely get a RECORD OUT
         OF RANGE error at a later time.  In this case, both index files
         must be re-created by typing in the commands:

                   USE MAIL
                   INDEX ON LNAME + FNAME TO NAMES
                   INDEX ON ZIP TO ZIPS

         or . . .

                   USE MAIL INDEX NAMES,ZIPS
                   REINDEX

              Again, you can avoid the re-indexing by always keeping both
         index files active with the command USE MAIL INDEX NAMES, ZIPS
         when working with the database.



          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                      Page 11






         TECHNICAL RATIONALE

              From a technical standpoint, the reason that the FIND and
         WHILE approach always dramatically outperforms the FOR approach is
         quite simple.  Whenever you use the FOR option to perform a
         search, dBase always starts accessing the records from record
         number 1 and reads every single record directly from disk.
         Therefore, if you had a database with 10,000 names in it, ten of
         which were "Miller", dBase would perform 10,000 disk accesses to
         display the 10 "Miller"'s.  Ten thousand disk accesses takes a
         very, very long time.

              Many unnecessary disk accesses can be avoided by the fact
         that dBase always stores an active index file in RAM (at least, as
         much of it as will fit in RAM).  Furthermore, the index file is
         always in sorted order in RAM.

              When you use the FIND command with an index file, dBase finds
         the first "Miller" in the index file in RAM, which requires no
         disk accesses.  Furthermore, the WHILE option only searches WHILE
         (as long as) the search condition is true.  Therefore, dBase will
         stop searching as soon as it encounters the first non-"Miller" in
         the index file.  Since the index is already in sorted order,
         "Miller"'s are all clumped together and disk accesses will stop as
         soon as all the "Miller"'s have been displayed.

              So the FOR command requires 10,000 disk accesses to display
         the ten "Miller"'s, while the FIND and WHILE approach only
         requires 10 disk accesses; thereby eliminating 9,990 unnecessary
         disk accesses and about 20 or 30 minutes of time depending on the
         size of the database, the amount of RAM your computer has, and the
         speed of your disk drives.

                                      - EOF -























          ----------------------------------------------------------------
             Maximizing dBase II and III Performance - by Alan Simpson
                                      Page 12


�