Turbo Pascal for DOS Tutorial by Glenn Grotzinger Part 10 -- binary files; units, overlays, and include files. All parts copyright 1995-6 (c) by Glenn Grotzinger. There was no prior problem, so lets get started... Typed binary files ================== We know that files can be of type text. We can also make them type "file of ". We can read and write binary data types to disk. Here's an example. Keep in mind that with typed binary files, you can only read and write the type of file you define it to be. For the example below, we can only deal with integers with this file. The type we may use may be anything that we have covered up to this point. We also will see that reading, accessing and writing of typed binary files will be no different than accessing text files, except we can not make use of readln and writeln (as those are for text files only). program integers2disk; { writing integers 1 thru 10 to a disk data file, then reading 'em back } var datafile: file of integer; i: integer; begin assign(datafile, 'INTEGERS.DAT'); rewrite(datafile); for i := 1 to 10 do write(datafile, i); close(datafile); { done with write } reset(datafile); { now lets start reading } read(datafile, i); while not eof(datafile) do { we can use the same concept } begin writeln(i); read(datafile, i); end; writeln(i); close(datafile); end. You will notice the numbers 1 through 10 come up. Look for the file named INTEGERS.DAT, and then load it up in a text editor. You will notice that the file is essentially garbage to the human eye. That, as you see, is how the computer sees integers. In part 11, I will explain storage methods of many many different variables, and introduce a few new types of things we can define. We can use records, integers, characters, strings, whatever...with a typed file as long as we comply with the specific type we assign a file to be in the var line. Untyped Binary Files ==================== We can also open binary files as an untyped, unscratched (essentially) file. There we simply use the declaration "file". (I think this is ver7 dependent, am I right?) Anyway, in addition to this, we have to learn a few new commands in order to use untyped files. BLOCKREAD(filevar, varlocation, size of varlocation, totalread); filevar is the untyped file variable. varlocation is the location of where we read the variable into. size of varlocation is how big varlocation is. totalread is how much of varlocation that was readable. (optional) BLOCKWRITE(filevar, varlocation, totalread, totalwritten); filevar is the untyped file variable. varlocation is the location of where we read the variable into. totalread is how much of varlocation was readable. (optional) totalwritten is how much of varlocation that was written. (optional) SizeOf(varlocation) Function that gives total size of a variable in bytes. Maximum readable by BlockRead: 64KB. Reset and Rewrite have a record size parameter if we deal with an untyped file. Probably, the best thing to make things clearer is to give an example. This program does the same thing as the last one does, but only with an untyped file. See the differences in processing... program int2untypedfile; var datafile: file; i: integer; numread: integer; begin clrscr; assign(datafile, 'INTEGERS.DAT'); rewrite(datafile, 1); for i := 1 to 10 do blockwrite(datafile, i, sizeof(i)); close(datafile); reset(datafile, 1); blockread(datafile, i, sizeof(i), numread); while numread <> 0 do begin writeln(i); blockread(datafile, i, sizeof(i), numread); end; close(datafile); end. This program performs essentially the same function as the first example program, but we are using an untyped file. Blockread and blockwrite are used in very limited manners here. It's *VERY GOOD* for you to experiment with their use!!!!!!! As far as the EOF goes on a comparison, blockread returns how many records it actually read. We use that as an equivalent. The 2 missing DOS file functions ================================ We now have the tools to perform the 2 missing DOS file functions that you probably recognized were gone from part 8, copying files, and moving files. Copying files essentially, is repeated blockreads and blockwrites until all the input file is read and all the output file is written. We can do it with either typed or untyped files. An untyped file example may be found on page 14 of the Turbo Pascal 7.0 Programmer's Reference. For those who do not have this reference...Snippet of my own...untested... while (numread <> 0) or (bytesw = bytesr) begin blockread(infile, rarray, sizeof(rarray), bytesr); blockwrite(outfile, rarray, bytesr, bytesw); end; Moving files is a copy of an input file to a new location, followed by erasure of the input file. Units ===== A unit is what you see probably on your drive in the TP/units directory. Compiled units are TPU files. They are accessed via USES clauses at the start. CRT, DOS, and WinDos are some of the provided units we have already encountered. Nothing is stopping us from writing our own, though. The actual coding of procedures/functions that we place into units is no different. The format of the unit, though, is something we need to think about. An example is the best thing for that. This is a simple implementation of a unit, with examples to give you some idea of a skeleton to place procedures and functions into. unit myunit; interface { all global const, type, and var variables go here as well as any code we may want to run as initialization for starting the unit. } { procedures and function headers are listed here } procedure writehello(str: string); implementation { actual procedural code goes here, as it would in a regular program } procedure writehello(str: string); { must be same as above } begin writeln(str); end; end. The unit up above is compilable to disk/memory, but unrunable. Essentially, what it is is a library of procedures/functions that we may use in other programs. Let's get an example out on how to use one of our own units. program tmyunit; uses myunit; { must match ID at beginning } var str: string; begin str := 'Hello! Courtesy of myunit!'; writehello(str); end. Though this program/unit combination is ludicrous, it does illustrate exactly how to incorporate your own unit with MANY functions into your programming, if your project gets too big, or for portability's sake on some of your frequently used procedures. Overlays ======== This will describe how to use TP's overlay facility. It must be used with units. Typically, my thoughts are that if you get a large enough project to dictate the use of overlays (we can use 'em on anysize projects, but the memory taken up by the overlay manager far uses more memory on smaller projects to make it an advantage to habitually do this). We will use the overlay facility with the unit/program set above for example purposes. ONLY CODE IN UNITS HAVE AN OPPORTUNITY TO BE OVERLAID! System, CRT, Graph, and Overlay (if I remember right) are non-overlayable. {$O+} is a compiler directive for UNITS only which designate a unit which is OK to overlay. {$O-} is the default, which says it's not OK to overlay a unit. To get to the overlay manager, we must use the overlay unit. After the overlay unit, we need to use the {$O } compiler directive to specify which units that we want to compile as an overlay. WARNING: It is good to check your conversion to overlays in a program with a copy of your source code. If you alter it with overlays in mind and it doesn't work (it's known to happen -- a procedure works ONLY when it's not overlaid...), you won't have to go through the work to alter it back if it doesn't work right... NOTE: You must compile to disk, then run when you work with overlays. Results come back in the OvrResult variable. Here's a list... 0 Success -1 Overlay manager error. -2 Overlay file not found. -3 Not enough memory for overlay buffer. -4 Overlay I/O error. -5 No EMS driver installed. -6 Not enough EMS memory. As for examples, let's look at the unit set up to overlay. As we can see, the only real difference (which is a good policy to make), is that there is the {$O+} compiler directive there now... {$O+} unit myunit; interface { all global const, type, and var variables go here as well as any code we may want to run as initialization for starting the unit. } { procedures and function headers are listed here } procedure writehello(str: string); implementation { actual procedural code goes here, as it would in a regular program } procedure writehello(str: string); { must be same as above } begin writeln(str); end; end. Now lets look into the program itself. It's error-reporting from the overlay manager isn't great. It stops the program if the overlay won't load, but doesn't do a thing, really, with the ems section. program tmyunit; uses myunit, overlay; {$O MYUNIT.TPU} { include myunit in the overlay } var str: string; begin ovrinit('TMYUNIT.OVR'); { final overlay file name/init for program. } if OvrResult <> 0 then begin writeln('There is a problem'); halt(1); end else write('Overlay installed '); ovrinitems; {init overlay for EMS. Usable after ovrinit} if OvrResult <> 0 then writeln('There was a problem putting the overlay in EMS') else writeln('in EMS memory.'); str := 'Hello! Courtesy of myunit!'; writeln; writehello(str); end. EXE Overlays ============ Here's how to set up EXE overlays. The DOS copy command features the B switch. For example, to take the programs source file above and attach the overlay to the end of the EXE (be sure you run any exe packers/encryptors before you do this!), use the following: COPY /B TMYUNIT.EXE+TMYUNIT.OVR Then the change that needs to be made in the source for the program is to change the overinit line to read TMYUNIT.EXE instead of TMYUNIT.OVR. You should be able to handle doing this and understanding what is going on. Include Files ============= Use the {$I } compiler directive at the position the include file is to be placed. An include file is code that is in another file, which may be considered as "part of the program" at the position the {$I ..} compiler directive is at. Copy function ============= You can use the copy function to get a portion of a string into another part of a string. For example... str := copy('TurboPascal', 5, 3); writeln(str); { writes oPa } Programming Practice for Part #10 ================================= We have opened ourselves a business selling computer equipment in 1993. Since we have occupied ourselves with working on computers, and not on bookkeeping (we wanted to save the funds instead of hiring someone), and rather not use the cash registers, we have done everything on paper over the last two years. It's the beginning of 1996, and any accurate records of sales progression, as well as records of our customers has become almost impossible, since our records are represented by a closet-full of paper. So, we finally have decided to get things into computer. To do the typing, we have temporarily hired interns from a nearby business college. Unfortunately, with our limited funds, we could not draw in people who had sufficient typing skill and accuracy, but we took what we could get. We now have things typed in as text files with 80 columns a line. Unfortunately, the interns' attention to detail has been as bad as their typing skill, and nothing makes sense in their work. Our purposes is to save our money in hiring these interns and locate the badly entered records, while writing the good records to a solid binary data file by the name of COMPHVN.DAT. For the bad records, on EACH AND EVERY error we encounter, we should write a text message with the first 20 characters of the problem line and a description of what is wrong with the data set for that particular error so we may go back through and make the interns redo what they did wrong to a text file named ERRORS.LOG. The data format for the output file COMPHVN.DAT is as follows. For interest of efficiency, we shall write this program using COMPHVN.DAT as an untyped file. As the person posing this problem, I realize that some of the data types in this record will not be recognizable at this point, but with the variable description, you will know how to handle them, and in part 11, you will see what they are exactly. In creating a binary file, we must always be concerned with using the least amount of space as effectively as possible. Uses of the variables will be explained later. For interest of typing efficiency on your parts, I am asking that you cut and paste this record description out of this description and save it as a text file named COMPHVN.INC, which may be used as an include file in our compilation. comphvndata = record datacode: string[7]; acct_classification: char; phone_area: integer; {area+prefix+exchange = phone number} phone_prefix: integer; phone_exchange: integer; work_area: integer; work_prefix: integer; work_exchange: integer; other_area: integer; other_prefix: integer; other_exchange: integer; cnct1_lname: string[16]; cnct1_fname: string[11]; cnct1_minit: char; cnct1_pobox: integer; cnct1_sname: string[8]; cnct1_stype: string[4]; cnct1_apt: integer; cnct1_city: string[10]; cnct1_state: string[2]; cnct1_zip: longint; cnct1_birthm: byte; cnct1_birthd: byte; cnct1_birthy: integer; accept_check: boolean; accept_credt: boolean; balnce_credt: real; total_sold: real; cnct1_emp_code: string[4]; total_sales: integer; emp_name: string[10]; emp_stnum: integer; emp_sttype: string[4]; emp_city: string[10]; emp_state: string[2]; emp_zip: longint; emp_area: integer; emp_prefix: integer; emp_exchange: integer; emp_yrs: byte; compu: boolean; compu_type: string[9]; compu_mon: char; compu_cdr: boolean; compu_cdt: char; compu_mem: byte; minor: boolean; end; The format for our INPUT file, which will be named INDATA.TXT, will be as follows (80 characters). Since we had 15 interns doing the typing at once we also had them merge their work. They were careless, and may have not accomplished it properly. There will be three lines for each customer that we have encountered. Line 1 Line 2 -------------------------------------------------------------------- datacode columns 1-7 datacode columns 1-7 acct_classification column 8 accept_check column 8 sequence number column 9 sequence number column 9 phone_area columns 10-12 cnct1_stype columns 10-13 phone_prefix columns 13-15 cnct1_apt columns 14-17 phone_exchange columns 16-19 cnct1_city columns 18-27 work_area columns 20-22 cnct1_state columns 28-29 work_prefix columns 23-25 cnct1_zip columns 30-38 work_exchange columns 26-29 cnct1_birthm columns 39-40 other_area columns 30-32 cnct1_birthd columns 41-42 other_prefix columns 33-35 cnct1_birthy columns 43-46 other_exchange columns 36-39 balnce_credt columns 47-55 cnct1_lname columns 40-55 total_sold columns 56-63 cnct1_fname columns 56-66 cnct1_emp_code columns 64-67 cnct1_minit column 67 total_sales columns 68-70 cnct1_pobox columns 68-72 emp_name columbs 71-80 cnct1_sname columns 73-80 Line 3 -------------------------------------------------------------------- datacode columns 1-7 accept_credt column 8 sequence number column 9 emp_stnum column 10-13 emp_sttype column 14-17 emp_city column 18-27 emp_state column 28-29 emp_zip column 30-38 emp_area column 39-41 emp_prefix column 42-44 emp_exchange column 45-48 emp_yrs column 49-50 compu column 51 compu_type column 52-60 compu_mon column 61 compu-cdr column 62 compu_cdt column 63 compu_mem column 64-65 minor column 66 spaces column 67-80 Now, a description as to what is defined as a correct set that we should write to COMPHVN.DAT. 1) Each 3 lines that are read are considered for errors. Check the sequence numbers. The first line's sequence number should be 1, for example. A successful read of 3 lines should say 1, 2 and 3 in that order. For example, in our error reporting, if you have a read of 1,2,2 , you should not write the group to the binary file, and report a duplicate line #2 and a missing line #3. There will not ever be a circumstance where these sequence numbers will all be the same...The cases covered in this paragraph would be the only cases that would ever forstall processing of error-checks listed in points 2-14. 2) Datacode on lines 1, 2 and 3 should MATCH exactly and be checked for the following: It has the format, for example, with my name of GROTZ*G, and should be verified using the cnct1_names... 3) phone_area, phone_prefix, phone_exchange, work_area, work_prefix, work_ exchange, other_area, other_prefix, other_exchange, pobox, emp_zip, emp_ area, emp_prefix, emp_exchange, emp_yrs, cnct1_zip, cnct1_birthm, cnct1_ birthd, cnct1_birthy, balance_credt, total_sold, total_sales, compu_mem all should be checked to verify that they are numeric in origin. 4) phone_prefix, work_prefix, other_prefix, emp_prefix all should not start with a 1 or a 0. 5) cnct1_birthy should be in this century 1900-1999. 6) acct_classification should be B,C,G,P, or O. 7) accept_check, accept_credt, compu, compu_cdr, and minor should be Y or N. 8) emp_yrs (employed how many years?) should be checked with cnct1_birthy for sanity (a person who was born in 1980 cant have worked 20 years). 9) If compu is N, then compu_type, compu_mon, compu_cdr, compu_cdt, and compu_mem should be either blank or 0 depending upon the type of field. 10) cnct1_emp_code should be GOVT, RET, STUD, or BUS. If this field is RET, then emp_* should either be blank or 0 depending on the type of field. 11) compu_mon should be S, V, E, C, H, or I. 12) compu_cdt should be 1, 2, 4, 6, or 8. 13) emp_sttype and cnct1_stype should be BLVD, LANE, ST, AVE, CT, LOOP, DR, CIRC, or RR. 14) minor should be Y if person listed in cnct1_?name is < 21 years old and N otherwise. Check to be sure that this field is correct in being Y or N. Format of ERRORS.LOG (also solution to the INDATA.TXT posted below) -------------------- Error Report -- INDATA.TXT -------------------------- First 20 characters of line Problem --------------------------- -------------------------- GROA2*GN334 ST WAR Datacode does not agree with name. GROT2*GP181612932918 Work-exchange is not numeric. GROT2*GP181612932918 phone-prefix started with a 0 or 1. GROT2*GT2ST 314 SED accept-check is invalid. GROT3*GP181642932918 Duplicate line #1 GROT3*GN234 ST WAR Missing line #3 GROT4*GI181642932918 Datacode does not agree with name. GROT4*GY2ST 314 SED Datacode does not agree with name. GROT4*GN334 ST WAR Datacode does not agree with name. GROT4*GY2ST 314 SED cnct1-birthy is not in this century. GROT4*GI181642932918 acct-classification is invalid. GROT7*GN334 ST WAR emp-zip is not numeric. GROT7*GN334 ST WAR compu-cdr is invalid. GROT7*GN334 ST WAR The emp-yrs doesn't make sense. GROT7*GN334 ST WAR There were fields present when compu was N. GROT7*GN334 ST WAR compu-mon is invalid. GROT7*GN334 ST WAR compu-cdt is invalid. GROT8*GN334 ST WAR empcodes are present when RET is true. GROT8*GN334 ST WAR compu-mon is invalid. GROT0*GN334 STR WAR compu-cdt is invalid. GROT0*GN334 STR WAR emp-sttype is invalid. Remember to be as general as possible on your error messages. Use the example listed above as a guide. Your program can not predict everything. Also, in the interest of finding out your programming skill, we ask that you code this program using the pascal overlay system with EMS load capability, with all error codes and status statements active and visible to the user, for at least one procedure or function. Also note, that many of the separate integer fields are put together in the input file, so we can not just plain read the input file. Here is a copy of the current input file, INDATA.TXT (keep in mind it's 80 characters per line, and the character positions MATTER) ---------------------------------------------------- GROT1*GP1816429329181674700008163475753GROT1INGER GLENN K232 34th GROT1*GY2ST 314 SEDALIA MO64093 062519742.34 3245.23 STUD32 CMSU GROT1*GN334 ST WARRENSBURMO65337 81654341114 YHOMEBUILTVY18 N GROT2*GP18161293291816747000A8163475753GROT2INGER GLENN K232 34th GROT2*GT2ST 314 SEDALIA MO64093 062519742.34 3245.23 STUD32 CMSU GROA2*GN334 ST WARRENSBURMO65337 81654341114 YHOMEBUILTVY18 N GROT3*GP1816429329181674700008163475753GROT3INGER GLENN K232 34th GROT3*GY1ST 314 SEDALIA MO64093 062519742.34 3245.23 STUD32 CMSU GROT3*GN234 ST WARRENSBURMO65337 81654341114 YHOMEBUILTVY18 N GROT4*GI1816429329181674700008163475753BROT4INGER GLENN K2E2 34th GROT4*GY2ST 314 SEDALIA MO64093 062518742.34 3245.23 STUD32 CMSU GROT4*GN334 ST WARRENSBURMO65337 81654341114 YHOMEBUILTVY18 N GROT5*GP1816429329181674700008163475753GROT5INGER GLENN K232 34th GROT5*GY2ST 314 SEDALIA MO64093 062519742.34 3245.23 STUD32 CMSU GROT5*GN334 ST WARRENSBURMO65337 81654341114 YHOMEBUILTVY18 N GROT6*GP1816429329181674700008163475753GROT6INGER GLENN K232 34th GROT6*GY2ST 314 SEDALIA MO64093 062519742.34 3245.23 STUD32 CMSU GROT6*GN334 ST WARRENSBURMO65337 81654341114 YHOMEBUILTVY18 N GROT7*GP1816429329181674700008163475753GROT7INGER GLENN K232 34th GROT7*GY2ST 314 SEDALIA MO64093 062519742.34 3245.23 STUD32 CMSU GROT7*GN334 ST WARRENSBURMO65W37 816543411134NHOMEBUILT 00 N GROT8*GP1816429329181674700008163475753GROT8INGER GLENN K232 34th GROT8*GY2ST 314 SEDALIA MO64093 062519742.34 3245.23 RET 32 CMSU GROT8*GN334 ST WARRENSBURMO65337 81654341114 YHOMEBUILTZY18 N GROT9*GP1816429329181674700008163475753GROT9INGER GLENN K232 34th GROT9*GY2ST 314 SEDALIA MO64093 062519742.34 3245.23 STUD32 CMSU GROT9*GN334 ST WARRENSBURMO65337 81654341114 YHOMEBUILTVY18 N GROT0*GP1816429329181674700008163475753GROT0INGER GLENN K232 34th GROT0*GY2ST 314 SEDALIA MO64093 062519742.34 3245.23 STUD32 CMSU GROT0*GN334 STR WARRENSBURMO65337 81654341114 YHOMEBUILTVYA8 N Notes ----- 1) You may use a for loop to read each set of 3 lines. I will not throw an error of omission of lines into the data file. There will always be multiples of 3 lines to work with. 2) The included data file in this text file includes errors from all 14 points listed above. The data file I use for the contest will be different, but will as well cover all 14 points listed above... 3) Be sure to get good use of your debugger, as you will NEED it...Also, be sure to plan the program -- this is an easy one, yet it's complex because of the amount of planning it requires...plan well, it's easy. Don't plan well, it's a bugger...:> 4) ONE hint: remember string addressing, and use of the copy procedure. 5) Another hint. You can have what is referred to as "next sentence" IF THEN ELSE statements. It is very good in this program to be able to use them. (if condition then else) is essentially, a do nothing if con- dition is true situation. I suggest it because the pascal operator NOT seems to not work right in all cases. :< Also, keep in mind that this is the part 10 practice, too, so be sure to at least attempt it! Next Time ========= Interfacing with a common format; how data types are stored in memory and on disk. You may wish to obtain use of a hex viewer for this next part. Send comments to ggrotz@2sprint.net.