Maximizing Performance with dBase II and III by Alan Simpson Reprinted from "Personal Systems", The Journal of the San Diego Computer Society Issues of November and December 1984 and January 1985 Typed and Formatted by Rick Yount, Oak Harbor, WA (206) 675-9797 This article may be reprinted for any non-commercial purpose; please credit the Author and "Personal Systems". TABLE OF CONTENTS: Trimming Minutes Down to Seconds ........... 1 Don't Resort to Re-Sorting ................. 2 Faster Sorts ............................... 4 Faster Searching ........................... 4 Faster Math ................................ 5 Faster Reports ............................. 6 Faster Copying ............................. 6 Faster Edits ............................... 7 Searching for Ranges ....................... 8 Estimating Performance ..................... 8 Managing Multiple Index Files .............. 9 Trade-Offs ................................ 10 Technical Rationale ....................... 12 Everyone wants the most out of their computer. Granted, it's nice that the computer can trim down to minutes what usually requires humans hours to perform. But then, nobody complains if those computer minutes can be trimmed down to seconds. In this paper we'll discuss and compare general techniques for maximizing the performance of dBase. We'll trim some of those long dBase processing minutes down to quick dBase seconds. TRIMMING MINUTES DOWN TO SECONDS We'll begin by benchmarking (comparing) a few techniques for performing some basic tasks with dBase II. We'll use a simple mailing list database as an example. Its name is MAIL.DBF, and it has this structure: FLD NAME TYPE WIDTH DEC 001 LNAME C 020 000 002 FNAME C 020 000 003 ADDRESS C 020 000 004 CITY C 020 000 005 STATE C 010 000 006 ZIP C 010 000 007 AMOUNT N 009 002 First, let's discuss various methods for sorting this database: ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 1 DON'T RESORT TO RE-SORTING There are basically four methods to maintain a sorted list of items in a database. The first is to simply CREATE the MAIL database, add records to it, then sort it with the SORT command. For example, suppose you CREATE MAIL and APPEND 999 records to it. To add another record to it, and maintain an alphabetical listing by last name, you would need to use the commands: USE MAIL APPEND SORT ON LNAME TO TEMP This procedure would require about 940 seconds to re-sort a database with 1000 records in it. Another method would be to USE MAIL, APPEND the new record, then INDEX the file on the LNAME field, using the commands below: USE MAIL APPEND INDEX ON LNAME TO NAMES This procedure requires about 530 seconds to re-sort the 1000-record database back into last name order A third method would be to locate the position in the database that the new record belongs, and INSERT a new record into its proper alphabetical place, as below: USE MAIL LIST (to find the insertion point) INSERT The time required would be however long it takes you to find the appropriate place to insert the new record, plus about 124 seconds for the INSERT command to rearrange the database. The fourth method is to create an index file of the field to sort on, and keep that index file active while adding the new record(s). In most cases, the best time to index a file is immediately after creating it, as in the commands below: CREATE MAIL USE MAIL INDEX ON LNAME TO NAMES It only takes about two seconds to create the index file when the database is empty. At this point, the MAIL.DBF database and the LNAME.NDX files exist on disk, but both are empty. To add new data, you would need to make the NAMES.NDX file active by typing in the commands: ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 2 USE MAIL INDEX NAMES If you always USE MAIL INDEX NAMES when you add new data, the index file will automatically be updated, therefore the data will always be sorted. So to add one record to the MAIL database with 999 records in it, you would type the commands: USE MAIL INDEX NAMES APPEND There is no need to go through the INDEX ON procedure again, because the NAMES.NDX file was active during the appending procedure. The time required to automatically re-sort the index file using this method is a scant 3 seconds. Add to that the original two seconds for the INDEX ON command to create the initial index file for a total of 5 seconds. Table 1 compares the processing times for the four methods (using a 16-bit computer with 256K RAM, floppies, and dBase II Version 2.4): ------------------------------------------------------- Method Commands Used Time Required 1 USE, APPEND and SORT 940 seconds 2 USE, APPEND and INDEX ON 530 seconds 3 USE, INSERT 129 seconds 4 USE file1 INDEX file2 APPEND 5 seconds ------------------------------------------------------- Table 1: Sorting Times; Four Different Approaches Remember, you need to first create an index file based upon the field(s) you wish the database to be sorted by. Use the INDEX ON command to create the index (.NDX) file. For example, to maintain an alphabetical listing of people on the MAIL database by last name, type in the commands: USE MAIL INDEX ON LNAME TO NAMES This creates and stores an index file called NAMES.NDX on disk. To make the index file active, specify its name in the USE command, as below: USE MAIL INDEX NAMES Once the file is made active in this way, any changes to the database, whether they be through APPEND, EDIT, BROWSE, REPLACE, PACK or READ will automatically re-sort and adjust the index file accordingly. Avoiding the re-sorting process on only one way in which index files can greatly improve the speed of a dBase II software system. Most types of processing that involve searching can also be accelerated. ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 3 FASTER SORTS In some cases, a database should be physically sorted rather than indexed. For example, the UPDATE command works best if the file you are updating FROM is physically pre-sorted. You could use the SORT command to create a sorted database called TEMP from the MAIL database using the commands: USE MAIL SORT ON LNAME TO TEMP This approach takes about 940 seconds, or about 15 minutes of processing time. If the NAMES index file already exists, you can achieve the same result using the commands: USE MAIL INDEX NAMES COPY TO TEMP Copying the contents of the indexed file only required about 326 seconds of processing time; about one-third the time. When you COPY an indexed file to another database, the records on the database you copy to will be physically sorted. FASTER SEARCHING Let's assume that the MAIL.DBF database already has 1,000 records on it. Ten individuals on this database have the last name (LNAME field) "Miller". The question is: Just how long does it take to LIST, COUNT, or COPY all the "Miller"'s to another database; or how long does it take to print a formatted REPORT with only "Miller"'s, or to SUM the amounts for the "Miller"'s? The answer, of course, is: It depends on how you do it. For our benchmark comparisons, we'll assume that the database has already been indexed on the LNAME field, using the commands: USE MAIL INDEX ON LNAME TO NAMES Let's begin by comparing processing times using two different command files and approaches. The first method will use the standard LIST FOR approach to fish out all the "Miller"'s. The command file looks like this: ********** Method 1: LIST FOR approach ERASE USE MAIL INDEX NAMES ACCEPT " List all with what last name? " to SEARCH LIST FOR LNAME=SEARCH When you DO this command file, it clears the screen and displays the prompt: List all with what last name? ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 4 Suppose you type in the name "Miller" and press the RETURN key. The program will then display the records of all ten "Miller"'s on the screen. The time required for the command file to display the "Miller"'s and return to the dot prompt is about 148 seconds on a floppy disk system. Almost two and a half minutes. A second approach to solve this problem is to use the FIND command to look up the first "Miller" in the NAMES index, then use the WHILE option to display the remaining "Miller"'s in the database, as in the program below: ********** Method 2: FIND and LIST WHILE approach ERASE ACCEPT " List all with what last name? " to SEARCH FIND &SEARCH LIST WHILE LNAME = SEARCH Processing time for the second method to display all 10 "Miller"'s then redisplay the dot prompt on the screen, is less than 9 seconds. Table 2 compares processing times for the two different methods. Both methods perform exactly the same task, but the processing times vary dramatically. ------------------------------------------------------- Method Commands Used Time Required 1 LIST FOR 148.75 seconds 2 FIND and LIST WHILE 8.94 seconds -------------------------------------------------------- Table 2: Comparison of Processing Times; Two Methods FASTER MATH The FIND and WHILE approach with indexed databases can offer significant time savings with dBase commands other than LIST. For example, if you want dBase to COUNT how many "Miller"'s are in the database, you can use the commands: USE MAIL COUNT FOR LNAME = "Miller" This approach requires about 55.5 seconds to display the fact that there are 10 "Miller"'s in the database, then redisplay the dot prompt. You can cut this time down significantly using the commands: USE MAIL INDEX NAMES FIND Miller COUNT WHILE LNAME = "Miller" This approach performs the same task in about 3.20 seconds; quite a significant time savings. ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 5 Suppose you wish to SUM the AMOUNT field for just the "Miller"'s? You could use these commands: USE MAIL SUM AMOUNT FOR LNAME = "Miller" Processing time using this method is about 58 seconds. The alternative approach uses these commands: USE MAIL INDEX NAMES FIND "Miller" SUM AMOUNT WHILE LNAME = "Miller" This approach takes 5.28 seconds; less than one-tenth the time. FASTER REPORTS You can use the WHILE command with the REPORT command, also. For example, suppose you've already created a formatted report called MAILIST using the REPORT command. To display all the "Miller"'s on the formatted report, you could use the commands: USE MAIL REPORT FORM MAILIST FOR LNAME = "Miller" This approach requires about 135.3 seconds to display all the "Miller"'s on the report, then redisplay the dot prompt. The faster approach uses these commands: USE MAIL INDEX NAMES FIND Miller REPORT FORM MAILIST WHILE LNAME = "Miller" These commands perform the same job in a slim 14.03 seconds. FASTER COPYING For copying portions of the MAIL database to a database called TEMP, the commands: USE MAIL INDEX NAMES COPY TO TEMP FOR LNAME = "Miller" require a hefty 200.6 seconds; more than three minutes. You can perform the same job using these commands: USE MAIL INDEX NAMES FIND Miller COPY TO TEMP WHILE LNAME = "Miller" These commands trim the copying time down to a comfortable 18.7 seconds. ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 6 FASTER EDITS Indexed files can also speed up the editing process. For example, suppose you want to BROWSE through the database to edit data for one of the "Miller"'s. One approach would be to type in the commands: USE MAIL BROWSE These commands will set up the dBase BROWSE screen with the names and addresses in their original order, as displayed in Figure 1: LNAME--------------------FNAME------------------ADDRESS Bond James 007 Spy St. Kenney Dave 123 Clark St. Newell Jeff 341 Lou Drive Mohr Richard 350 W. Leadora St. Rosiello Rick 999 Buddy Way Wallace Doug 345 Killer St. Miller Mike 601 Lemon Dr. Figure 1: An unindexed BROWSE screen. You will need to use lots of CTRL-C commands to scroll through the database to find the "Miller" you're looking for. There's no telling how long it might take you to find the particular "Miller" you wish to edit, because the "Miller"'s will be placed randomly throughout the database. If, on the other hand, you use these commands to BROWSE: USE MAIL INDEX NAMES FIND Miller BROWSE The BROWSE screen will display the first "Miller" on the database, and all the remaining "Miller"'s immediately beneath, as shown in Figure 2: LNAME--------------------FNAME------------------ADDRESS Miller Mollie 601 Mission Blvd. Miller Shiela 1234 Genessee Miller Ms. Stephanie S. 734 Rainbow Dr. Miller Patti 626 Mazda Way Miller George P.O. Box 2802 Miller Julie 999 Love St. Miller Caron 123 Princess Way Miller Ms. Chrissie 321 Hynde St. Miller Mrs. Sally S. 325 Seco Ct. Miller Dr. James T. 701 Newport Dr. Figure 2: A BROWSE screen with an indexed database ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 7 No need to go scrolling through pages and pages of BROWSE screens to find the "Miller" you wish to edit, because all ten "Miller"'s are displayed immediately and simultaneously on one BROWSE screen. Similarly, if you wish to use the EDIT command to change the data for a particular "Miller", you can use the commands: USE MAIL INDEX NAMES FIND Miller EDIT WHILE LNAME = "Miller" These commands will quickly display the first "Miller" in the database on the EDIT screen. Each subsequent CTRL-C command will immediately position you to the next "Miller" in the database. So once again, you can save a great deal of time by not having to scroll around and search for individual "Miller"'s. SEARCHING FOR RANGES The FIND and WHILE approach can also be used for searching for ranges of data. For example, if a database had a DATE field with dates stored in MM/DD/YY format, and the database was indexed on the date field, the following command file would list all records between the requested starting and ending dates: USE file INDEX datefield STORE " " TO START, FINISH @ 2,2 SAY "Enter Start Date " GET START PICTURE "99/99/99" @ 4,2 SAY "Enter End Date " GET FINISH PICTURE "99/99/99" READ FIND &START LIST WHILE DATE <= FINISH (This command file assumes that all dates have the same year. It's a little trickier when the data is spread across several years ESTIMATING PERFORMANCE Performance can be an important issue in developing a custom dBase II software system. Generally, processing times tend to increase linearly with the size of a database. Therefore, on a database with 5,000 records in it, displaying all the "Miller"'s could take as long as 740 seconds, a little over 12 minutes, with the LIST FOR approach. Using an indexed file with the FIND and LIST WHILE approach will perform the same task in about 45 seconds, less than a minute. Double those times for a database with 10,000 record in it: A whopping 24 minutes (almost a half hour) with the FOR approach, vs. 88 seconds (under a minute and a half) with the FIND and WHILE approach. ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 8 Keep in mind that any dBase command that allows the FOR option will allow the WHILE option instead (e.g. LIST, DISPLAY, COPY, REPORT, SUM, COUNT, REPLACE, DELETE, TOTAL). Therefore, any of these processes can be greatly expedited with an index file and the FIND and WHILE commands. MANAGING MULTIPLE INDEX FILES In the previous examples, we compared processing times using an index file called NAMES. This index file contains only the LNAME field. Realistically, a mailing list will probably require two separate sort orders; 1) a sort by last name and first name for printing a directory listing, and 2) a sort by zip code for bulk mailings. In that case, you need to create two index files to manage the two different sort orders. One index file, which we'll call NAMES, will keep the mailing list data in last name and first name order. A second index file, which we'll call ZIPS, will maintain a zip code order for handling bulk mailings. To create both of these index files, you'll need to first CREATE the MAIL.DBF database with the CREATE command, then immediately create the two index files using the commands: USE MAIL INDEX ON LNAME + FNAME TO NAMES INDEX ON ZIP TO ZIPS The MAIL.DBF database now has two index files associated with it; NAMES.NDX and ZIPS.NDX. You can keep both index files active by using the command: USE MAIL INDEX NAMES,ZIPS By specifying two index files in this fashion, all future modifications to the database with the APPEND, EDIT, BROWSE, READ, or REPLACE commands will * * AUTOMATICALLY * * update both index files. If you LIST or DISPLAY ALL the records in the database, they will be displayed in last name order, since NAMES is the first- listed index file in the INDEX portion of the command line. Furthermore, you can only use the FIND command with the first listed index, NAMES. Therefore, to display all the records in the database in zip code order, you need not go through the INDEX ON ZIP TO ZIPS procedure again. Instead, you can simply type in the command: SET INDEX TO ZIPS This eliminates the time required to sort the file again. Before adding new records or editing the database, be sure to use the commands: ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 9 USE MAIL INDEX NAMES,ZIPS or USE MAIL INDEX ZIPS,NAMES This is required to make both index files active again. Otherwise, you'll be likely to get a RECORD OUT OF RANGE error sometime in the not too distant future. TRADE-OFFS There are some trade-offs to contend with when using multiple index files. Generally, the more index files you have active at any given moment, the longer it takes to perform an APPEND, EDIT or REPLACE procedure. For example, if you want to add records to the MAIL database without any active index files, you can just use the commands: USE MAIL APPEND As you type in each new individual's data, the screen will immediately accept each new record and re-display the next APPEND screen. However, if you decide to get carried away and create four index files, and keep them all active as: USE MAIL INDEX NAMES, ZIPS, CITIES, STATE You will notice a definite delay between the time you fill one APPEND screen and the appearance of the next blank APPEND screen. On a large data file with over 1,000 records in it, the delay could be as much as 20 seconds, depending on how many fields are in each index file and the RAM capacity of your computer. In general, one or two active index files are sufficient for most databases. The delays caused by one or two active index files are relatively insignificant, and are more than compensated for by the time savings that the FIND and WHILE commands offer, as well as by the time savings gained by avoiding re-indexing. Another disadvantage to indexed files occurs during global replaces. For example, if for some reason you wished to set all the AMOUNT fields back to zero in your hypothetical MAIL database, you could use the commands: USE MAIL INDEX NAMES,ZIPS REPLACE ALL AMOUNT WITH 0 On a database with 1000 records in it, this process could easily take 45 minutes. However, the commands above waste processing time by updating the index files when it is not necessary to do so. ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 10 Recall that the NAMES index contains the LNAME and FNAME fields, and the ZIPS index contains the ZIP field. The AMOUNT field is not used in either index file. Therefore you can use the NOUPDATE option (supplied in dBase Versions 2.4 and higher) to perform the replace. These commands: USE MAIL INDEX NAMES,ZIPS REPLACE NOUPDATE ALL AMOUNT WITH 0 perform the update in about six minutes. The NOUPDATE option informs dBase that, even though there are two active index files, there is no need to update them while performing this REPLACE command. There are some important points to keep in mind about the INDEX and FIND commands. First, the FIND command only works on an indexed field. If a database is in use with multiple index files, the FIND command only works with the first listed index file. For example, if you open the MAIL data base with the two index files as below: USE MAIL INDEX ZIPS, NAMES the FIND command can only be used to locate a zip code. If the data to look up in a database is stored in a variable, then the variable name must be "macro-ized" to be used with the FIND command, as below: ACCEPT "Look up whom? " to SEARCH FIND &SEARCH Also, FIND does not support functions or operators. That is, you cannot use the commands: FIND Miller .OR. Smith or FIND !(&SEARCH) nor FIND LNAME > &SEARCH. If you create two index files, but later add, edit, or delete data with only one or neither of the index files active, the index files will be corrupted, and you will most likely get a RECORD OUT OF RANGE error at a later time. In this case, both index files must be re-created by typing in the commands: USE MAIL INDEX ON LNAME + FNAME TO NAMES INDEX ON ZIP TO ZIPS or . . . USE MAIL INDEX NAMES,ZIPS REINDEX Again, you can avoid the re-indexing by always keeping both index files active with the command USE MAIL INDEX NAMES, ZIPS when working with the database. ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 11 TECHNICAL RATIONALE From a technical standpoint, the reason that the FIND and WHILE approach always dramatically outperforms the FOR approach is quite simple. Whenever you use the FOR option to perform a search, dBase always starts accessing the records from record number 1 and reads every single record directly from disk. Therefore, if you had a database with 10,000 names in it, ten of which were "Miller", dBase would perform 10,000 disk accesses to display the 10 "Miller"'s. Ten thousand disk accesses takes a very, very long time. Many unnecessary disk accesses can be avoided by the fact that dBase always stores an active index file in RAM (at least, as much of it as will fit in RAM). Furthermore, the index file is always in sorted order in RAM. When you use the FIND command with an index file, dBase finds the first "Miller" in the index file in RAM, which requires no disk accesses. Furthermore, the WHILE option only searches WHILE (as long as) the search condition is true. Therefore, dBase will stop searching as soon as it encounters the first non-"Miller" in the index file. Since the index is already in sorted order, "Miller"'s are all clumped together and disk accesses will stop as soon as all the "Miller"'s have been displayed. So the FOR command requires 10,000 disk accesses to display the ten "Miller"'s, while the FIND and WHILE approach only requires 10 disk accesses; thereby eliminating 9,990 unnecessary disk accesses and about 20 or 30 minutes of time depending on the size of the database, the amount of RAM your computer has, and the speed of your disk drives. - EOF - ---------------------------------------------------------------- Maximizing dBase II and III Performance - by Alan Simpson Page 12 û