//////// // // // // // /// // // // // //// // // //////// // // // // // // //// // // // /// // // // // /////// Pascal NewsLetter Issue #3 June, 1990 Editor: Pete Davis The Programmer's Forum BBS is the home of PNL. It can be reached in Washington, DC at (202)966-3647. Information is available through the following locations: FidoNet: Pete Davis@1:109/138 GEnie: P.DAVIS5 BitNet: HJ647C@GWUVM & UE356C@GWUVM InterNet: HJ647C@GWUVM.GWU.EDU UE356C@GWUVM.GWU.EDU and Pete.Davis@f138.n109.z1.fidonet.org 2 Table of Contents Introduction ..................................... Page 3 (P.D.) Letter from Charles Falconer ..................... Page 4 (C.F.) A Diatribe on Structured Programming and Pascal .. Page 8 (C.F.) Pascal I/O Semantics ............................. Page 18 (C.F.) All Sorts of Sorts ............................... Page 24 (P.D.) For Beginners .................................... Page 29 (B.G.) The Object of Objects ............................ Page 35 (J.R.) Doing More With Strings .......................... Page 39 (G.H.) Bits & Pieces .................................... Page 42 (****) Conclusion ....................................... Page 44 (P.D.) P.D. -> Pete Davis (Editor, Writer) C.F. -> Charles Falconer (Contributing Writer) J.R. -> Jim Reid (Contributing Writer) B.G. -> Bob Gowans (Staff Writer) G.H. -> Gerhard Hoogterp (Contributing Writer) Bits & Pieces is contibuted by various readers and contributing writers. Credit will be given in that Bits & Pieces area. Turbo Pascal is a registered trademark of Borland International 3 Introduction Well, I'm starting on issue #3 of the PNL for an unexpected reason. I, just minutes ago, received what must have been the toughest criticisms I have received yet. Anyone who frequents the Pascal echo in FidoNet knows who Charles Falconer is. I received a nice-sized packet of files from Charles regarding the PNL. It was loaded with constructive criticism. Some of it is already noticable. Charles pointed out that most people will go through the PNL on the screen and would prefer single-spaced text. Since I print it out, I was making it look good for printouts. I think Charles is right, though, so I am taking that advice. If there are enough people who want the double- spaced version, I will either release them together, or I will provide the double-spaced version to those who request it. Anyway, on with what Charles sent to me. His article was full of advice and comments. I really found it helpful and as Charles wrote: 'If I agreed with everything I read there would be no point in reading it!' I agree with that, but not everything you said.I did agree with most of what you said, though. Hopefully the users will approve of the changes. Anyway, along with the helpful comments, Charles included two articles that he had written. How could I possibly refuse to include them? Charles, thanks a lot. I really appreciate you advice and encouragement. I also appreciate your articles. They'll help beef-up the third issue a bit. Because Charles included so many good tips and because some of them are regarding articles I wrote in the first issue, I have included a copy of his letter so others can see what he has to say about Generic Structures and Optimization, two articles that were written in the first issue. I meant to get to my sorting article in the last issue. Obviously that didn't happen, so I'm putting it in this issue. I have been blessed with 4 articles so far. I can't tell you how much these articles help out. I think it is going to be necessary to make some small rules for submissions, though. The rules are pretty simple: 1> Single-Spaced 2> No justification 3> No page numbers 4> carriage returns only at end of paragraphs and where necessary (i.e. creating diagrams) I'm not going to refuse an article if it doesn't follow these specifications, it just makes things easier for me when including it in the newsletter. Well, I hope you enjoy what follows. 4 Letter from Charles Falconer I am attaching a couple of contributions - feel free to edit them within reason. These have been around in various forms for a while. Also feel free to grab anything I put on the Pascal Echo, if you wish, (unless I have done something foolish). It is all public domain. If you publish anything of mine include my Fidonet address please. On formatting of PNL: (all opinion) These are only suggestions, after all it is your project, and I am thankful for it. As an aside, the results are smaller and take less transmission time and disk space. I find the double spacing to be virtually unreadable on the CRT. This sort of thing is virtually never printed, and when it does the user doesn't want to use a lot of paper. Who knows, he may even have to feed individual sheets. I just found these on Hippocampus, and have not really read anything yet. I suggest a formatting of no more than 72, preferably 66, chars per line, with no indentation, single spaced, with a maximum of 60 lines/page. If lines are not right justified (minor point) the user can easily reformat to taste. Pages should be separated with form-feeds, not white space, as the white space makes it very hard to follow anything on the CRT. For the same reason pagenumbers and the ilk should be at page top. I attach PNL001 reformatted to roughly these specifications, for your consideration. I passed it through two filters, changed all into , and did some minor hand editing afterwards. The original contained some naked s, which didn't even overprint, but were removed by my stripovr filter (the filters are available in DETAB12.LZH on 141/205) If you feel indentation for printing is critical, I will be happy to write a small filter to add it in. It can also expand to a constant 66 (or other) lines per page if needed. (Europeans have 12 inch paper standard). Form-feeds also serve to trigger page printing on Laser Printers. I also have RNF, in Pascal, available with source, for generalized formatting. It can do quite a pleasant job. RNF13AT.LZH (110k), also on 141/205. RNF13SOR.LZH has ISO standard source, 13AT is for Turbo and includes the documentation and Turbo source. The docs are a deliberate mess, and get cleaned up by running RNF on them. With your permission only, I will mount the reformatted versions on Hippocampus to replace yours. No sweat either way. {Editors Note: Feel free to do it.} 5 Net mail to 141/209.1 will reach me, addressed to Charles Falconer. 1990/5/18 Chuck. ------ ps: I read over pnl001 and have a few minor comments: 1. Optimizing for size often DOES improve speed {ed. note: Yes, in most cases, but not all! }. The data structure and algorithm is usually more important than the code. Clarity is at the top of my wants 99% of the time. Do nothing without measuring the program first - otherwise the rewards are not going to be noticeable, but the (clarity) penalties are. Smaller code means more other resource available. I have a system where I can find sharp performance improvements for every 1k less code space used - sometimes only 128 bytes is important. In this case the difference is availability of swap space in memory, and the increment suddenly means something doesn't get swapped. 2. Generic procs - If the body of the system deals with the two pointer linkage item, rather than the data body, much more can be made generic. Only a few rare i/o procedures will need to know about the actual data item. Thus 'push' would push such a linkage record, pop would return one, and could be a function. Nothing need even know they are headers after the lowest level. 3. Fetish of mine - Pascal is ALWAYS capitalized. A proper name. 4. Optimizing - I virtually never turn off run time checks. The exception is in the heart of loops, once well debugged. Such loops should have loop invariants stated and checked. A whole program is never debugged. 5. I have a different view than most about formatting, some of which is in the attached files. 6. Generally excellent. If I agreed with everything I read there would be no point in reading it! You seem to have pretty sound teachers and a good mind. pps: Here is a useful tidbit for linked lists. This only assumes the list pointers contain a 'next' field. Takes a single pass. Just pass it the root pointer. PROCEDURE reverse(VAR dirlist : adirptr); (* Do a list walk and reverse the list order *) VAR prev, pprev : adirptr; 6 BEGIN (* reverse *) prev := NIL; WHILE dirlist <> NIL DO BEGIN pprev := prev; prev := dirlist; dirlist := dirlist^.next; prev^.next := pprev; END; dirlist := prev; END; (* reverse *) This allows you to create a list with simple code, i.e. new(p); p^.next := root; root := p; (* then set any data fields *) and later access in the order of input. One example of use is in scanning parameter lists when compiling a procedure header. Then the final list is used to check procedure calls correctly as they come in from the source. and here is the performance of MRGSORT on my system, sorting 12 char. random alphabetic strings (packed into pksize = 4 integers, 3 per integer). The whole of this was published on Pascal Echo, and is available as MRGSORT.LZH on 1:141/205. The MRGSORT unit operates on singly linked lists, only requiring that 'next' be the first field. (* On my 8mhz V20 XT system, executes as follows: *) (* items creation time sorting time *) (* ----- ------------- ------------ *) (* 10 0.013 Sec. 0.010 Sec. *) (* 100 0.117 Sec. 0.164 Sec. *) (* 500 0.582 Sec. 1.050 Sec. *) (* 2500 2.903 Sec. 6.407 Sec. *) (* 12500 14.502 Sec. 38.028 Sec. *) (* (FULL) 33874 38.028 Sec. 113.692 Sec. *) (* which shows the n*log(n) behaviour of the algorithm. *) and using the following comparison procedure: {$f+} (* passed functions MUST be far *) FUNCTION greater(thing, than : pointer) : boolean; (* This is the time bind - make assy language. This *) (* will later be passed in as a param to mrgsort *) LABEL 9, 10; VAR k : pkindex; (* These gyrations bypass type checking, and describe *) (* the actual pointer type that mrgsort will call with *) {} a : alfaptr ABSOLUTE thing; {} b : alfaptr ABSOLUTE than; 7 {$r-,s-} BEGIN (* greater *) greater := true; FOR k := pksize DOWNTO 1 DO (* Check most sig. first *) IF a^.s[k] > b^.s[k] THEN GOTO 10 (* decided *) ELSE IF a^.s[k] < b^.s[k] THEN GOTO 9; (* decided *) 9: greater := false; 10: END; (* greater *) {$r+,s+,f-} (* put the options back *) This being the major time bind, it is the only place that range checks are turned off. As an aside, I would have used your 'generic' linkage method, except that it would involve an extra dereference right here in the sensitive heart of the algorithm. Changing to such is easy. The {} in the lh margins mark non-standard usage. {Editors Note: 8 lines removed: Not of interest to readers.} 8 A diatribe on Structured Programming and Pascal by C.B. Falconer, 680 Hartford Tpk, Hamden, Ct 06517. 281-1438 Structured programming is NOT about GOTO less code, or languages, but about the design methodology. I use Pascal, (ISO standard version, not Turbo, but the remarks generally apply). You design a program from the top down, i.e.: Step 1. Write down what it is going to do, and specify the data used. Keep the data to an absolute minimum, because the less you have to remember the fewer mistakes there will be: (I will write a simple line copying program, assuming two extensions to standard Pascal, i.e. read(..) can read a string, and length(..) can return the length of that string. Most systems have these extensions, but if not the procedures are easily created in STANDARD Pascal) (For Turbo users, some additional changes have to be made. These are shown in a final addendum to this file. The problems are: 1. Turbo REQUIRES the assign operation 2. Turbo fails to automatically close files 3. Turbo fails to truncate string writes to the field value. 4. Turbo uses the predefined "string" type to enable the read(string). all are fairly easily handled, as can be seen in the modifications) ------- PROGRAM something(infile, outfile, output); (* using output for error messages, processing infile to outfile *) (* Define all data types that will be used GLOBALLY or as parameters *) CONSTANT maxbuff = 80; TYPE buff = ARRAY [1..maxbuff] OF char; VAR infile, outfile : text; buffer : buff; BEGIN (* something *) initialize; WHILE NOT eof(infile) DO BEGIN getdata(infile, buffer); putdata(outfile, buffer); END; 9 cleanup; END. (* something *) ------- This models what you want to do, with no detail. Once you have written the other procedures (and whatever they require) you are done. ALL the data that is really needed is already specified. You can add dummy procedures and check the result by compiling at this point. Note that, as a matter of style, all data is passed via parameters. This makes future changes much easier, and avoids looking around for "what is this thing". Step 2: Put in dummy procedures to resolve the program. The result can be compiled. ------- PROGRAM something(infile, outfile, output); (* using output for error messages, processing infile to outfile *) (* Define all data types that will be used GLOBALLY or as parameters *) CONSTANT maxbuff = 80; TYPE buff = ARRAY [1..maxbuff] OF char; VAR infile, outfile : text; buffer : buff; (* 1---------1 *) PROCEDURE initialize; BEGIN (* initialize *) END; (* initialize *) (* 1---------1 *) PROCEDURE getdata(VAR infile : text; VAR buffer : buff); BEGIN (* getdata *) END; (* getdata *) (* 1---------1 *) PROCEDURE putdata(VAR outfile : text, buffer : buff); 10 BEGIN (* putdata *) END; (* putdata *) (* 1---------1 *) PROCEDURE cleanup; BEGIN (* cleanup *) END; (* cleanup *) (* 1---------1 *) BEGIN (* something *) initialize; WHILE NOT eof(infile) DO BEGIN getdata(infile, buffer); putdata(outfile, buffer); END; cleanup; END. (* something *) ------- Notice that each procedure is indented to show its control level, and the BEGIN/ENDS are commented. This will be useful when fully expanded, and if inserted at this stage as a habit will always be present to help. Notice also that each parameter is VAR only when the procedure must be able to alter it. In some cases this may cause a minor slowdown, but the advantage is the GUARANTEE that errors in one procedure do not propagate into the rest of the program. This also allows the parameter to be supplied by an expression (for integers, reals, booleans, etc.), which cannot be done for a VAR parameter. Think of non-VAR parameters as pre-initialized data areas for the procedure, dynamically assigned. Sometimes obscure bugs arise in Pascal because of failure to specify the VAR when the parameter is to be modified. This seems to be mentally easy to overlook. All file specification paramters must always be VAR. You can read this program (even though it does nothing yet) and trivially satisfy yourself that it is correct. I.E. it will perform the function, and will terminate. FROM HERE ON you will NEVER change the global part. EVERYTHING added will be in the 4 procedures that the global program calls. You may want to tweak the type definition of buffer (that is why the constant MAXBUFF exists). Keeping everything simple, we will leave the cleanup procedure empty (but another application may need it), and fill in the details for a 11 complete program. Step 3: If complex, put in simple code to check that the outer system runs the way you want it (starts, stops, etc). Then repeat the design process on each individual procedure if needed. This example is so simple that no repeat will be needed. ------- EXCEPT for the "readln" in GETDATA, and the "length" in PUTDATA, this will compile and run on ANY Pascal system meeting the ISO or ANSI standards. Correction will involve the definition of buff, and writing two custom procedures, at most. PROGRAM something(infile, outfile, output); (* using output for error messages, processing infile to outfile *) (* Define all data types that will be used GLOBALLY or as parameters *) CONSTANT maxbuff = 80; TYPE buff = ARRAY [1..maxbuff] OF char; VAR infile, outfile : text; buffer : buff; (* 1---------1 *) PROCEDURE initialize; (* all that is needed to open the input and output files in standard *) (* Pascal. Non-Standard systems (e.g. Turbo) may need more. *) BEGIN (* initialize *) reset(infile); rewrite(outfile); END; (* initialize *) (* 1---------1 *) PROCEDURE getdata(VAR infile : text; VAR buffer : buff); (* We might want this to read one word, as an example of a different *) (* program using the same basic model, or to ignore non-numeric input *) (* For numeric input, we could send an error message if none found. *) 12 BEGIN (* getdata *) readln(infile, buffer); END; (* getdata *) (* 1---------1 *) PROCEDURE putdata(VAR outfile : text, buffer : buff); (* And this might want to upshift everything, or insert tabs, etc. *) BEGIN (* putdata *) writeln(outfile, buffer : length(buffer)); END; (* putdata *) (* 1---------1 *) PROCEDURE cleanup; BEGIN (* cleanup *) END; (* cleanup *) (* 1---------1 *) BEGIN (* something *) initialize; WHILE NOT eof(infile) DO BEGIN getdata(infile, buffer); putdata(outfile, buffer); END; cleanup; END. (* something *) ------- And the program is written. Each component is so simple, and deals with so few items, that you can be sure it is correct. Almost all errors will be syntax errors (omitted semi-colons, mis-spellings, etc) which the compiler will catch for you. If you are writing in a non-structured language, such as assembly or Fortran, use an outline similar to the above, and translate it into the language you are using. GOTO's can then be used, BUT THE FLOW OF CONTROL is strictly structured, and thus clear. (But there is a place for GOTOs, even in Pascal - that is another diatribe). Similarly the use of data items is clear, even if you have to declare other globals to replace the local variables, because of the end language. Notice that, in Pascal, you are usually backing up to somewhere between the procedure (or program) heading, and the executable code for that block (marked by "BEGIN (* blockname *)" ) to insert the next level. In Pascal you never need to worry about names, because inserted procedures are local, and the only name conflicts that can occur are with those names shown in the header (and any local 13 declarations). You can see all these immediately. This all builds on the well known fact that human brains can normally handle up to about 7 items simultaneously. Once you have to worry about more things you will miss something. Thus, KEEP IT SIMPLE. Build a complex system out of SIMPLE building blocks. At any level, you can understand what is going on. Pascal makes such coding very easy, but forces you to follow a standard format (basically declare everything before using it). The use of indentation to depict levels of control eases understanding, i.e. to see what controls whether a statement is executed, just follow up to the outer indentation level (this is a convention, not a language feature). Everything looks more or less like IF condition THEN BEGIN dothis; andthat; andsoforth; END ELSE BEGIN dosomethingelse andsoon; END; and you can tell at a glance what conditions will allow each group to be executed. If you really must use Basic, Fortran, Cobol, and other antiques, try writing pseudo code in a structured form, keep it as comments, and let it define the real code you write. When you make a change, revise the pseudo-code (for a real change, not just a coding error), and keep it current. It is not hard to write 500 or 1000 line programs that work first shot in this manner (excepting the typos, which the compiler usually will catch for you). Summary (and additions) - KEEP EACH module SIMPLE. Let the outer code define what each module must do. Keep each module on one CRT screen or less. Use parameters, even if not really needed. (You can change to globals for efficiency later if you have to for efficiency). AVOID proliferation of GLOBAL variables. Keep it SIMPLE. ------------------------------------------------------------------ (For Turbo users, some additional changes have to be made. These are shown in a final addendum to this file. The problems are: 1. Turbo REQUIRES the assign operation 2. Turbo fails to automatically close files 3. Turbo fails to truncate string writes to the field value. 14 4. Turbo uses the predefined "string" type to enable the read(string). all are fairly easily handled, as can be seen in the modifications) HERE IS THE MODIFIED PROGRAM, with changes inserted commented. look for the "<<<" string. ------- THIS Turbo VERSION will not run on an ANSI/ISO standard system. PROGRAM something(infile, outfile, output); (* using output for error messages, processing infile to outfile *) (* Define all data types that will be used GLOBALLY or as parameters *) CONSTANT maxbuff = 80; TYPE buff = string[maxbuff]; (* <<< CHANGED *) (* ^ *) (* ^-- one of the worst choices in Turbo is *) (* that "string" is a reserved word (rather than *) (* a predefined type. This causes all sorts of *) (* nuisances in porting standard Pascal to Turbo.*) VAR infile, outfile : text; buffer : buff; (* 1---------1 *) PROCEDURE initialize; (* all that is needed to open the input and output files in standard *) (* Pascal. Non-Standard systems (e.g. Turbo) may need more. *) (* 2---------2 *) PROCEDURE attachfile(VAR f : text; position : integer); (* << ADD *) (* attach the Pascal file to the commandline parameter *) (* in position "position" of the run-time command *) (* A style note: An IF THEN .. ELSE .. clause should *) (* normally be written with the longer clause in the ELSE *) (* part. This makes it easy to see the flow of control. A *) (* short ELSE clause 28 pages down is easily missed, not to *) (* mention the nuisance of deciding which IF it goes with. *) 15 BEGIN (* attachfile *) IF position <= paramcount THEN assign(f, paramstr(position)) ELSE BEGIN writeln('Need file #', position : 1, ' specification'); halt; END; END; (* attachfile *) (* 2---------2 *) BEGIN (* initialize *) attachfile(infile, 1); attachfile(outfile, 2); (* << ADDED *) reset(infile); rewrite(outfile); END; (* initialize *) (* 1---------1 *) PROCEDURE getdata(VAR infile : text; VAR buffer : buff); (* We might want this to read one word, as an example of a different *) (* program using the same basic model, or to ignore non-numeric input *) (* For numeric input, we could send an error message if none found. *) (* Instead of using the Turbo string type, we could have *)(* << NOTE *) (* made a different buff definition, such as: *) (* buff = RECORD *) (* lgh : integer; *) (* data : ARRAY[1..maxbuff] OF char; *) (* END; *) (* and altered this procedure to read into it, setting *) (* the lgh value. Then the putdata would have needed *) (* only minor modification, such as: *) (* writeln(buffer.data : buffer.lgh); *) (* except that Turbo fails to use the field correctly. *) BEGIN (* getdata *) readln(infile, buffer); END; (* getdata *) (* 1---------1 *) PROCEDURE putdata(VAR outfile : text, buffer : buff); (* And this might want to upshift everything, or insert tabs, etc. *) BEGIN (* putdata *) writeln(outfile, buffer : length(buffer)); (* << NOTE *) (* For Turbo strings we can omit the " : length(buffer)" *) 16 (* clause above. If the buff type was still really an *) (* array of char we would need something like : *) (* FOR i := 1 TO length(buffer) DO *) (* write(f, buffer[i]; *) (* to compensate for the Turbo failure to truncate. *) END; (* putdata *) (* 1---------1 *) PROCEDURE cleanup; BEGIN (* cleanup *) close(outfile); (* << ADDED - Turbo doesnt automate *) END; (* cleanup *) (* 1---------1 *) BEGIN (* something *) (* << The outer block doesnt change *) initialize; WHILE NOT eof(infile) DO BEGIN getdata(infile, buffer); putdata(outfile, buffer); END; cleanup; END. (* something *) ------- More style notes: BEGIN and END are used to delimit compound statements in Pascal, and not to show control flow. Modula uses END to show the end of controlled statements, and implies the BEGIN after IF, WHILE, etc. Thus I feel extra indentations for BEGINs are pure confusion (and lead to programs that run off the right of the page), and subordinate them. I also usually attach the END to the end of the controlled code, as shown, and use a separate line only when multiple ENDS must appear. This helps to keep code vertically compact, so that the whole control flow can be seen at a glance. In Pascal BEGIN END really delimit blocks only when used at the start and finish of PROCEDURE or FUNCTION code. Similarly, CASE labels are really LABELS, and belong at the left of the source listing. They are not controlled statements, but reference points to which control flows, eg: CASE ch : char OF 'A', 'a': dothingswitha; 'B', 'b': dothingswithb; 'C': BEGIN makeanoise; dothingswithbigC; END; 17 'c': dothingswithlittlec; END; (* case *) and, on reading, it is quite obvious that only one path will be followed, and how to find that path. 18 PASCAL I/O semantics, File PASCALIO.DOC C.B. Falconer 85/9/11, 87/2/12, 88/11/20, 89/12/12 This is an explanation of the action of Pascal I/O, as applied to text files. A system meeting the ISO and ANSI standards is assumed. This does not apply to Turbo Pascal exactly, because Turbo omits some of the standard abilities and functions, especially for console input (but Turbo 4 up is closer). UCSD Pascal fails in console i/o, but other operations are implemented. PascalP functions exactly as described below. REMEMBER - THIS IS ANSI/ISO STANDARD PASCAL. The standard has been published and in effect since 1983. The standard does NOT forbid extensions, but does insist on compliance. In addition a system meeting the standard MUST have a way of disabling any and all extensions, so that source can be checked for portability. Any Pascal file is conceptually a single stream, with a file buffer variable. If we always refer to the file variable itself as "f", the buffer variable is "f^". If f is declared as "f : FILE OF thing", then f^ is of type "thing", and may be used as such a variable at any time the file is open (i.e. after the file has been reset or rewritten). A Pascal text file is equivalent to "PACKED FILE OF char", and additionally specifies that the eoln, readln, writeln procedures may be used. THESE MAY NOT BE USED ON A NON-TEXT FILE. Reading ------- For reading, a file at any time consists of two ordered arrays of items. The first is the portion that has already been input, and the second is the portion that has not been input yet. The buffer variable f^ always contains the last single item input (consisting of characters, an eof mark, and an eoln mark for text files). The eoln mark always appears as a space in f^, and may only be detected by the eoln procedure. The eof mark in any non- empty text file must immediately follow an eoln mark (specified by the standard). (Thus any good system will automatically append an eoln on closing a file, if and only if it is not already present.) The second portion of the file is unlimited, and unknown as yet to the Pascal program. When a file is "reset" the file is actually opened, and the first char is placed in f^ (this may be the eof or eoln mark, checked by eof/eoln functions). This first char is removed from the second portion. If reset is applied to an already open file, the effect is always to rewind it (if possible). Obviously some files cannot be reset, e.g. it is hard to make the console repeat its previous sequence, 19 and thus "reset(input)" may be an error on some systems. From here on, the action of the "get(f)" procedure is to advance one further character in the source file, discarding the old f^ value, and replacing it with the next char. It should always be an error to do this when eof is true. Note that nothing has yet affected any variable in the Pascal program, except the f^ buffer. These are the underlying functions of the input system. The program may use the file by such actions as "ch := f^" at any time. The syntax of "read(f, ch)" is STRICTLY defined as "ch := f^; get(f)", and the eoln and eof functions examine the non-visible characteristics of the last input character. If "f" is omitted, as in "read(ch)" the standard file "input" is assumed, and the buffer variable is "input^". For most CPM or MSDOS systems the file actually contains a to mark eoln, and a <^Z> to mark eof. The value of f^ when eof is true is not defined by the standards, but when eoln is true it should be a space. Thus the character can not appear (unless the system defines eoln as the pair. Some systems always discard any , so that the file action remains the same when input from a keyboard as when input from a disk file. The syntax of "read(f, ch1, ch2, ..)" is defined as "read(f,ch1); read(f,ch2); .... ", and is simply a shorthand. If the object read-into is an integer, or a real, then automatic conversion is performed from a text string, and at completion f^ holds the terminating character (space, non-numeric, etc). Such a read causes a run-time error when no valid integer etc. is found before a terminator, but leading blanks (and eolns) are skipped over. Notice that nothing so far controls any flushing of input lines, to ensure that a read starts on the next physical line. This is performed by "readln(f)", which is defined as "WHILE NOT eoln(f) DO get(f); get(f)". NOTE the final get. This always leave f^ holding the first character of the next line (which is a space if the next line is empty, i.e. consists of eoln alone), or possibly an eof mark. Again, an omitted "f" implies input. The syntax of "readln(f, item1, item2, .. itemn)" is defined as "read(f,item1); read(f,item2); ... read(f,itemn); readln(f)", and is again just a convenient shorthand. Interactive files ----------------- This brings up the great bugaboo of Pascal text i/o: When a file is reset it MUST place the first character in f^. If that file is 20 interactive (i.e. the keyboard) the first character must be typed at that time. Thus the natural sequence "reset(f); write('prompt message'); read(f, ch)" to get a reply to a prompt requires that the answer be typed before the prompt is made. The problem also reappears after any readln, because the first "get" from the next line is performed. (see below for why f^ is filled at all) This is normally cured by a special driver for text files. Whenever the "get" is executed it simply sets a flag somehere (totally invisible to the application program) which says "a get is pending". (If get finds the flag set it must perform the pending get, and then again set the flag). Note that the "get" may be implied by a reset, read, or readln operation. Now the system must again intercept any use of eoln, eof, or the f^ variable and, before actually executing them, check the "get_pending" flag. If set the actual get must be performed, the flag reset, and then the eoln, eof, f^ references may be made. This prevents the early physical read, and allows natural programming. However the programmer should always remember that any reference to eof, eoln, or f^ will cause the physical read. Thus the sequence "reset(f); IF eof(f) THEN something; write('prompt'); read(f,ch)" will cause the physical read to be too early. Some systems (e.g. UCSD) do not follow the ANSI/ISO standard, and define a special interactive file type where read(f, ch) is defined as "get(f); ch := f^". This causes all sorts of problems, because the programmer must always know that this file is interactive, and programs cannot use the standard input and disk files interchangably. The "get" is normally executed on reset (or readln) so that the value of eoln and eof is available after using a character (by read), and so that the program can look ahead to the next character. This allows decisions to be made, i.e. is the following character numeric.. then read a number; or is it alpha .. then read a char; or is it a special .. then read a user command etc. Thus a file copy program such as: WHILE NOT eof DO BEGIN WHILE NOT eoln DO BEGIN read(ch); write(ch); END; readln; writeln; END; works naturally. The read/write line can be replaced by write(input^); get(input); END or by some sort of filter such as IF input^ <> ' ' THEN write(input^); get(input); END; 21 to strip out all blanks ith the same action and no auxiliary variable. Such a fragment can copy the standard input to standard output, and works correctly with any i/o redirection applied. NOTE that "reset(input)" is always automatically performed when a program begins running, and similarly "rewrite(output)". Thus such statements should normally not appear in a program. Think of readln as a line-flushing procedure, but bear in mind that "readln(item)" is always equivalent to "read(item); readln". Writing ------- Again, all files consist of 2 parts, that which has been written (and is in the external file), and that which has not. The next item to be written is ALWAYS in f^, and "put(f)" is defined to do the writing, and leave f^ contents undefined (in practice, most systems leave f^ unchanged). Similar to read, "write(f, item)" is STRICTLY defined to be the sequence "f^ := item; put(f)". For text files ONLY, writeln can put an eoln mark in the external file. Most systems, including PascalP, can in practice create an eoln mark by writing suitable characters, BUT THIS IS NOT portable. Write(f, item1, item2, .. itemn) is STRICTLY defined as being "write(f,item1); write(f, item2); ... write(f, itemn)", and "writeln(f, item)" is defined as "write(f, item); writeln(f)". Both of these are again shorthand. The writeln procedure alone (i.e. writeln(f) ) simply puts an eoln mark into the file being written. If the "f" specification is omitted the write is shipped to "output" file by default. Again, the fundamental writing procedure is "put(f)", which causes the content of f^ to be appended to the end of the file f. "write(f, item) is STRICTLY defined as "f^ := item; put(f)", and should be unable to create the eoln mark in a text file (reserved for writeln). The action of "rewrite(f)" is to empty any old version of f, and leave f^ undefined. f^ is also undefined after any write operation. Thus doing nothing except "rewrite(f)" in a program should leave f as an empty file, but existing. All Pascal files should be automatically closed when the defining program (or procedure for a local file) is exited. Some systems provide a "close" procedure to force an early close for one reason or another (e.g. to release a locked file to another user in a multi-process environment). If a file was open for write (via rewrite), and is later "reset", an automatic close is done. These closings of a written file append the eof mark, and force any system buffers to be flushed. Some systems are incomplete, and actually require that a specific call to "close" be made. 22 This procedure is non-standard, and such programs will not be portable. Text file writes ---------------- The standard specifies the formatting abilities for various types of objects. The use of fields for integers and reals is complied with in general, however write(f, astring : field) is defined to RIGHT justify astring in field positions, AND TO TRUNCATE astring (on the right) when it is longer than field. An item of type astring is compatible with PACKED ARRAY[1..] OF char. Failure to do this truncation is a major impediment to portablity in Turbo Pascal. To harp on the action of writeln, this marks the end of a text line and usually flushes any buffers. The actual implementation of the file system should not matter, provided it meets the abstract requirements of Pascal. Thus systems that never put an eoln marker on the disk, but simply retain pointers to the line endings, should function correctly. Fixes ----- Again, this is how it should work according to international (and ANSI) standards. Some systems do not meet the standards - beware. The so-called string type in Turbo and UCSD Pascal is not really necessary, although it is often a convenience. It is possible to create systems that are fully Standard compatible, and still have string reads, concatenation, dynamic length, extraction, etc. All such operations are available in PascalP as extended system pro- cedures, and thus porting to another system only requires writing such procedures. When not provided as part of the system package efficiency may suffer, but no logical change in the program will be needed. For this reason I suggest that Turbo users should ALWAYS use the concat function for strings, rather than the "+" operator. When this is done the program can be converted fairly easily to an ISO standard system. For Turbo Pascal users, I have written a set of includable procedures or units (see TURBOFIX.LBR or TURBOFIX.ARC for TP3, TXTFInnn.LZH for TP4 up, with nnn 121 at present) which make Turbo meet these standards, although you will have to use non-standard procedure names. These are available without charge 23 for non-commercial use. I hope this clears up some confusion. 24 All Sorts of Sorts By Pete Davis Every programming student comes across sorting at one time or another. Occasionally, one comes across sorting in their own work. Picking the right sort for the job isn't always an easy process. Many things need to be taken into consideration. First of all, what will the typical size for the data be? Some sorts work better for a lot of data and others work better with small amounts. Other factors include knowing what kind of data-structure you're working with. If you're using a linked list, certain sorts might not be worth the trouble, but if you're using an array, it might be ok to use those sorts. I want to offer a range of sorts here. Each one will include a short algorithm and some code examples will be included with this issue of PNL. With each sort, I'll give a description of how it works, what it's good points are and what it's problems are. I'll explain what kind of applications it is best for and when to avoid it. There are many programming books that go into a lot of depth on sorting, so if you really want to get into the meat of sorting, I suggest you check out your public library or local bookstore. There are several ways to measure the quality of a sorts. Average number of comparisons and average number of exchanges are very common to get an idea of how much work the sort has to do for a particular amount of data. i.e. average number of comparisons for 100 items in a bubble sort is about 5000, depending on exactly how the algorithm is implemented. Other important factors are the actual time it took to execute the sort and memory requirments for the sort. A. Bubble Sort -------------- Talk about basic, the bubble sort is about the most basic and easy to understand of the sorts. It's simplicity gives way to slow execution and many comparisons. For small amounts of data (i.e. 100 or fewer data items) the bubble sort will usually suffice. The exact amount of data really depends on the size of the data-items and the algorithm used. The only time where the bubble sort is the best sort, is if the data is already in order. Otherwise, chances are, most other sorts will beat the bubble sort, especially if the lowest value is near the end of the list or the highest value is near the beginning of the list. The bubble sort uses the simple concept of going through a list and checking each item against the item next to it. If they are out of order, then put them in order. It keeps repeating this process until it reaches the end of the list. Then it starts over at the beginning of the list again, and keeps repeating this whole process until no more items are switched. You can either check to see if no more items are switched or you can go through N-1 times where N is the number of items in the list. The prefered method is to check for whether or not anything is switched. This means that if the list is already in order, 25 you won't have redundant passes through the list. The following will show a group of characters out of order. I will then show what would happen with each pass of the bubble sort. START -> F E R M T A P STEPS: { E&F SWITCH : F&R IN ORDER : R&M SWITCH R&T IN ORDER : T&A SWITCH : P&T SWITCH } END OF PASS 1 -> E F M R A P T { # exchanges: 4 } END OF PASS 2 -> E F M A P R T { # exchanges: 2 } END OF PASS 3 -> E F A M P R T { # exchanges: 1 } END OF PASS 4 -> E A F M P R T { # exchanges: 1 } END OF PASS 5 -> A E F M P R T { # exchanges: 1 } I only showed all of the steps for the first pass. I believe it is straight-foward enough that it was all that was necessary. You can examine the code for more examples. Notice how much work it took to sort only 7 items, though. there were 5 passes, and each pass consisted of 6 comparisons and at most, 4 exchanges and at best, 1 exchange with a total of 9 exchanges and 30 comparisons. This isn't a worse case scenario, but it is what I will compare several of the sorts against. After that, we'll take some actual results from our test program. B. Selection Sort ----------------- The selection sort is kind of like putting a hand of playing cards in order. First thing you do is look at your whole list. You take the lowest value and put it in the beginning. Then you take the next lowest value and put it right after the first. (Or you could take the highest and then the next highest and do it backwards, it really doesn't matter.) I don't think any sort of diagrams or algorithms need to be explained for this. I think everyone is familiar with this technique. Although the selection sort isn't incredibly fast, it is very simple and does give a marginal improvement over the bubble sort, in general. C. Insertion Sort ----------------- The insertion sort can also use the idea of a hand of playing cards, but instead of working with all of your cards at once, you take a card one at a time and insert it where it belong as you get it. The insertion sort is fairly simple, but has some problems. The insertion sort is ideal for linked lists. If an insetion needs to be made, all you have to do is re-direct some pointers. The insertion sort is much less powerful when used with arrays, or contiguous lists. The problem is that when you insert a new value, chances are you'll average moving about half of the data in the list over one space. When your list gets long, that can cause major problems. Each time something is inserted, everything in the list would have to shift over. 26 The insertion sort, like the selection sort is very straight- foward and simple, so I'll just let you examine the code if you have any more questions. D. Shell Sort ------------- The shell sort uses the idea of sorting over an increment. Let's use a new list of data: F E R M T A P G N Y C and we'll take an increment of three. In our first step, we will compare item #1 with item #4 (1+increment of 3) then 2 and 5 and make exchanges over the increment. Let's see the results after 1 pass: F E A M G N P C R Y T We continue sorting with the increment of 3 until everything for that increment is in order. So, for the next pass we get: F E A M C N P G R Y T and the third step: F C A M E N P G R Y T now, everything is in order over the increment of 3, so we drop our increment to 2: A C E M F G P N R Y T and one more pass: A C E G F M P N R Y T now that that's in order, we drop to the final level with an increment of 1. At this point, a single pass will put everything in order: A C E F G M N P R T Y For larger amounts of data the results are a bit more dramatic. The choice of a starting increment of 3 and a decrement of 1 for each pass was completely ambiguous on my part. You can use any starting increment and any decrement per pass that you want. The final pass must be an increment of 1, however. With larger amounts of data, a starting increment of 5 or higher will be more useful, and keeping the space as an odd number for each pass is more efficient. The shell sort is a bit more complicated than the sorts shown previously but the average gain in speed is tens of times better than any of the previous sorts. 27 E. Merge Sort ------------- Although the concept behind the Merge Sort is fairly simple, the actual coding of the sort is a bit more complex than one would expect. In simple terms, the Merge Sort breaks your data in half and then sorts each half. After the halves are sorted it puts them back together. It's a bit more complex in that the routine calls itself again and breaks each of the half-piles into halves, and so on and so on until each pile has either 1 or no items. You might be wondering why this is so quick, but think of it this way, the Bubble Sort is called a quadratic sort. That means that if you double the data to be sorted, it takes more than double the amount of time to execute the sort. So, it would be quicker to break a big set of data into two sets of data and sort them. The only extra time you'd need is to put them back together in order. The results are a bit more dramatic than that, though, as the piles keep getting broken down more and more making each individual sort incredibly quick. The Merge Sort works well with either contiguous data (arrays) or linked lists. It's a great general purpose sort, but for small amounts (less than 100 items) of data, it probably isn't worth the effort. The only real drawbacks to the Merge Sort is that it requires double the amount of space for data to make copies of the data. Another is that, because it's recursive, it incurs a bit of overhead on speed and space. This, however, isn't incredibly significant, however, when comparing speeds with the previous sorts. F. Quick Sort ------------- The Quick Sort is generally recognized as the best general- purpose sorting algorithm. It is much like Merge Sort in that the data is recursively broken into two pieces. The first step in the Quick Sort is to come up with a mid-value for your data. With numbers this is pretty easy but strings and characters might require a bit more effort. The way you come up with the mid-value can vary. The method I'm going to use in the sample provided is to take the mean of the first and last value in the list. It doesn't have to be the exact middle, but the more accurate it is, the faster the sort. If you have some sort of pre-knowledge of what your data will be like, you might be able to come up with a more accurate method to finding the mid- point. The next step is to move all values lower than the mid-point to the left (or lower end of your list) of the mid-point and all the higher values to the right. Then you keep recursively splitting the left and right in half, moving the values to the left and right side of the mid-point, appropriately. You keep doing this until you are down to 1 or 0 elements in the left and right. At this point, your data is in order. 28 Summary ------- The main point of this wasn't to go completely in-depth explaining the subtelties of each individual sort, but more to give an overview of each sort. I wanted the readers to be able to get a basic idea of how each sort works, find its advantages and dis-advantages, and be able to decide which sort will work best for your particular application. I'll leave most of the more technical explanations in the inline documentation of the sort unit I am providing. The test program gives you some stats on each sort that I have explained above. These routines that I am providing are not necessarily the fastest or most efficient algorithms. Everyone has their own minor variations on certain sorts and you may want to change them to suit your purposes. 29 For Beginners By Bob Gowans In the last issue of PNL (no.2) the beginner's column discussed a simple but complete Pascal program in which the use of the Pascal standard procedures writeln and write were highlighted. The rules governing identifiers were also put to you. Continuing on this theme I would like to show how you can build on this basic layout, and produce a more substantial program. Who has not heard of data? In this day and age it seems that there is an overwhelming amount of data available on all sorts of subjects. Our computer systems manipulate data - they take data in, process it and churn it out and there you have it, your electricity bill! As programmers we are mainly concerned with the manipulation of data, we write programs that can do all sorts of things with data and that produce data as results. But what exactly is data and how can we, as prospective programmers, make use of such data that is suitable for computers. Data comes in many forms, the actual word data means 'things given', an item of data represents some fact, observation or idea. We humans can recognize all sorts of data easily - text, numbers, pictures and speech to mention but a few. However, the poor computer can only, in most cases, accept data in the form of symbols such as characters and numbers, they are picking up mind you and there are devices that enable the input of sound and pictures to computers. Even with this restricted set of characters and numbers a huge range of data is available as input to the computer. For example, two characters can represent true and false, digits allow the computer to use integer numbers, a decimal point with digits means the computer can use real numbers,characters will allow us humans to input languages ( such as Pascal ) into the computer. Once the data is in the computer it can be processed and this is done according to sets of instructions or algorithms. This is where we come in, we want to be able to write algorithms that can manipulate the data that is available. 'What data is available?', might be a question you would be justified in asking. Well, Pascal has several different inbuilt types of data and also there is a facility for creating your own data types so you have plenty to go by. In Pascal the values True and False make up the BOOLEAN data type, whole numbers constitute the type INTEGER, decimal numbers are represented by REAL data types and characters are represented by the CHAR type. If this seems a little fuzzy to you it will soon become clear once you learn to implement the various data types in a Pascal program. Lets do that, sticking to the implementation of the integer type for now. Briefly, the store manager wants you to write a program that will calculate the total number of customers that use his store each day and output to the screen the total number of customers and the sub totals of female and male customers. You are given the respective male and female customer subtotals at the end of each day. In the following 30 program note how the lines are indented, if you copy this layout your programs will be more readable, but more about that later. Also note that we have introduced a new Pascal reserved word VAR, it is after this statement that variables are declared ie. the data types of the variables are declared. Do not worry about variables for the time being as we will again deal with these later. However, note that we can assign values to variables using the symbol := this is a very important and fundamental feature of Pascal. program add_up; { Program heading } var female : integer; male : integer; total : integer; begin female := 250; male := 101; writeln('The total number of customers today was ',total); writeln('The number of male customers was ',male); writeln('The number of female customers was ',female) end. In the program we have three variables which are declared to be of type integer, that is to say that the values we are going to give to these variables must be of an integer data type. Variables are declared following the Pascal reserved word VAR and are given a data type as follows variable : datatype; We could have written the variable declarations in our program as follows female,male,total : integer Each variable is separated by a comma then comes a colon before the data type. As the variables are of type integer, the computer now knows what type of data to store in them and will reserve space at a memory location for three integer data types, and this is crucial - it cannot store any other type of data in the locations it has reserved for our three variables. If, for example, we tried to give female a value of 1.5 then the computer would go haywire ( at least the computer knows there is no such a thing as half a female ). Seriously what would happen is that the computer would issue an error message and the program would probably abort, which would not be to good if you had just spent the last half hour entering data. At this point it would be as well to define the data type integer. INTEGER - the positive and negative whole numbers. 31 Incidentally the range of integers that you will be able to use depends on your computer/compiler, the range for Turbo Pascal using a 640K RAM IBM Compatible is 32767 to -32768 for those who are interested. The situation with our program is that we have declared three variables of type integer and now we are going to write some statements in our program that can play around with these variables - make them do some work for us! We can think of a variable as a memory location, with an attached name or identifier which the computer uses to access the contents of the variable. The rules for naming variables are the same that apply to naming identifiers which were discussed in the last issue of PNL. Why are they called variables? because as the name variable infers they can be updated or given new values. If we look at the program add_up we can see that female has been given a value of 250, male a value of 101 and total a value of the sum of male and female but these values are not fixed values as the program is updated every day so the values of male, female and total will change. In the program we say we have assigned a value of 250 to female and assigned a value of 101 to male. Finally we have assigned the value of the sum of male and female to total and written out the results. With each program that you write it is a good idea to plan a data table. Choosing the right data structures for variables and deciding which is the right data type to use is sometimes the programmers most difficult task, a data table can assist you greatly in helping to make things more clear. Lets write a data table for our program - in later issues you will be made to realise that it is better to plan and design the program first before writing the actual code. Identifier ( variable) Brief description Data type male holds current total of male customers. integer female holds current total of female customers integer total holds the current total of male + female integer You will have no doubt noted from the program that the value of a variable can be output to the screen, or to a printer or a file for that matter, and that this is done by using the writeln statement typically as follows writeln(variablename); This will have the effect of outputting to the screen the current value stored in the variable. The screen output from our program would thus be: 32 The total number of customers was 351 The number of male customers was 101 The number of female customers was 250 At the same time as outputting our variables we also took the opportunity to output a relevant textual comment. The string that we want to output must be in quotes and separated from the variable name by a comma (,) and the whole writeln statement is enclosed in brackets. I wonder if you noticed that there is space inserted between the last character in the textline and the final quotation mark in the writeln statement, the space is placed there so that the output from the variable is not written directly at the end of the text line, such as : The number of female customers was205 - resulting in messy screen output. A small point but well formatted text output is a must for today's user friendly programs. The assignment statement can do more than just give a single value to a variable, the assignment statement in Pascal has the following form : variable := expression When the computer executes such a statement, the expression on the right-hand side of the assignment operator (:=) is evaluated and the value is given to the variable on the left-hand side. If it helps you try reading the assignment operator as 'becomes' ie variable becomes expression. In this article we have learnt how to develop our programming skills by realising that the declaration and correct use of data types is an essential an important part of program development and that if a data type is declared the it can only be used for the purpose ie. data for which it was intended. We also discovered how to make assignment statements and how to perform certain operations on variables including writing them out to the standard output ( screen ). In the next article we will further expand on these basic concepts and hopefully introduce to you how to read in values to variables, by user input. The answer to the exercise given in the last article of PNL is : writeln(''How many times have I told you, don''t do that;she said''); This shows you that if you want to output text in quotation marks then you must double up on quotation marks in the writeln statement. 33 Self Test ---- ---- (a) The following sequence of statements appears in a pascal program. All variables are declared to be of type integer. apples := 10; oranges := 12; bananas := 6; fruit := oranges + bananas - apples; apples := fruit - apples; Write out the final value of fruit, apples, oranges and bananas. (b) The variable number is declared to be of type integer. Which of the following Pascal statements are valid : (1) number := 10; (2) number := number + 1; (3) 100 := number; (4) number = 100; (5) number := 23.5; (6) number := 100 + 15 - 205; (C) You are asked to write a program that counts the number of automobiles that are sold from a showroom. What data type would you choose to represent the total number of cars sold? How would you implement this problem in Pascal? Solutions --------- (a) fruit = 28, apples = 18, oranges = 12, bananas = 6. (b) (1) valid. (2) valid. (3) invalid ( you can only assign a value to a variable if they are on the left of the assignment operator). (4) invalid ( the assignment operator is := ). (5) invalid ( number is of type integer and you cannot assign to it a real value ). (6) valid. (c) integer as automobiles are best represented by whole numbers. program car_add; var 34 total,autos : integer; begin total := total + autos; writeln('The total automobiles sold are ',total) end. Obviously this program is incomplete as we require some method of inputting the automobile data into the variable autos. This would be done by writing prompts to the screen so that the salesman could enter information, the program would read in such information. We will cover this in the next article. For fun ------- The following program is intended for user's of Turbo Pascal only and makes use of the Turbo Pascal crt unit to produce sound on your PC's internal speaker. The main program is incorporated in a loop. Sound is used to get the attention or make the user aware of errors or as a security measure to deter unwanted users operating the system. As we become more proficient we will be able to put these advanced features to better use in our programs. program alarm; uses crt; var i : integer; begin for i := 1 to 100 do { loop start } begin sound(1500); delay(10); sound(500); delay(500); nosound; delay(1500) end { loop finish } end. Bob Gowans 35 The Object of Objects By Jim Reid With version 5.5 of Turbo Pascal, Borland added one of the newest features in modern programming languages--the concept of objects. The good news is that object-oriented Pascal programming is a major step forward for programmers. The bad news is that if you don't already know something about object- oriented programming (OOP for short), Borland's version 5.5 OOP Guide won't help you very much. The first page of Chapter 1, for example, drops on you such household concepts as "encapsulation", "inheritance", and "polymorphism." Ugh! I hope this short article can help a bit by talking about objects more plainly. So what is an object anyway? An object is a group of Pascal variables (of any type) and a group of functions and procedures that operate on them. In short, it is a programming technique that lets you create new conceptual data types (like CIRCLES) and new conceptual operators (like DRAW or SHRINK) for them. Here's a simple example. One way to define a window area on your display screen is by specifying two coordinates--the upper left (X,Y) position and the lower right (X,Y) position. So, you might think of a WINDOW_AREA as an object with four variables. Here's how you would start to define this as an object. Type Window_Area = Object Upper_X , Upper_Y , Lower_X , Lower_Y : byte; {...more to follow here later...} End; As you can see, an object looks a little like a RECORD TYPE. But there's nothing strange or unfamiliar about how you define the variables of an object. You use the same type of definitions you would use in the VAR statement. If this were all that objects had to offer, you'd wonder what all the fuss was over them. But defining the data variables of an object is only the first part of the job. Next, you define the procedures and functions that can operate on these variables. Using our example, suppose you wanted to have one procedure that sets the window coordinates, another procedure that defines and clears the windowed area on the screen, and a third procedure to disable it. Let's expand the object definition a bit further to do this: 36 Type Window_Area = Object Upper_X , Upper_Y , Lower_X , Lower_Y : byte; Procedure Init (byte X1, Y1, X2, Y2); Procedure Enable; Procedure Done; End; Procedure Window_Area.Init (byte X1, Y1, X2, Y2); Begin Upper_X := X1; Upper_Y := Y1; Lower_X := X2; Lower_Y := Y2; End; Procedure Window_Area.Enable; Begin Window(Upper_X,Upper_Y,Lower_X,Lower_Y); ClrScr; End; Procedure Window_Area.Done; Begin Upper_X := 1; Upper_Y := 1; Lower_X := 80; Lower_Y := 25; Window(1,1,80,25); End; Notice several things about the object's definition now. First, the object TYPE defines not only the variables but also a set of procedures. The PROCEDURE definitions look just the way you would define them in the INTERFACE portion of a Turbo Pascal UNIT--that is, you define the name and parameters of the procedure, but not the statements in it. Later in your program, below the object definition, you include the body of each of these procedures. Notice that each of the procedure names is prefixed with the name of the object type ("Window_Area" in this case). The procedures (or functions) defined for an object automatically have access to the variables of that object. Thus, these procedures can refer to "Upper_X" without having to qualify where that variable was defined. Within the body of an object's procedures, things look the same as usual. These procedures can be passed variables, or not. They can do 37 anything that any typical Pascal procedure can do. They are simply linked to their object's variables by reference. There's nothing magical about the names INIT and DONE, but Turbo Pascal's programming convention suggests that you name the procedure which initially defines an object's variables as INIT, and the procedure which finally terminates the object as DONE. If you don't like those names, you are free to use whatever names you prefer. Now that you've defined the object TYPE and its procedures, to actually define an instance of the object (like our Window_Area) you simply define it as a VAR. For example: Var My_Window : Window_Area; To use your defined object, you execute one of the object's defined procedures or functions. Here's a short example: My_Window.Init(20,5,60,20); My_Window.Enable; WriteLn('See. It worked!'); Delay(1000); My_Window.Done; This example would define the window as the area between (20,5) and (60,20), enable the window and clear it, write a short message in the window display area, pause briefly, then reset the window as the entire screen. Notice that you invoke an object's procedures just like any other Pascal procedures, but you prefix the procedure name with the name of a specific object variable ("My_Window" in this case). This may seem a little odd at first, but the concept is fairly simple. Think of an object as a RECORD variable that is tightly linked to a set of procedures and functions. Think of invoking an object's procedure as if you had simply called the procedure normally, passing it all the defined parameters plus the object's record variable. If you think of it in that way, you can see that this much of Turbo Pascal's object oriented programming extension is simply a shorthand method for passing record variables to procedures. There is much, much more to objects than this, however. Objects can be constructed out of the variables and procedures of other objects. For example, if you were writing a graphics program, you might define a POINT as an object. The point has a single (X,Y) coordinate. You might define a CIRCLE object as a POINT plus a radius. And you might define a SPHERE object as a CIRCLE plus a 3-Dimension boolean flag. The advantage which OOP offers you is that each object can be built on the foundation laid by the predecessor objects. That is, higher level objects inherit the data variables and procedures of lower level 38 objects on which they are defined. Objects also let you define to the procedure at execution time which objects they will act on. (This feature is called dynamic methods and polymorphism.) I'll try to write more on these concepts in a later article. The bottom line is that object-oriented programming can be used to make your programs more flexible and easier to test. By "encapsulating" the variables of an object and the procedures that operate on them, you simplify the debugging process. By building layers of objects, you simplify the coding process by reusing functions and procedures of predecessor objects. Eventually, after enough use of objects, this technique will change the way you structure and design your programs. You will learn to think first about the structure of your data, and second about the functions you perform on the data. If you'd like to see a longer example of using objects, I have included TP unit named TEXTLIST.ZIP. This unit can be used to build and manage a linked list of textual values. You might use this to build a simple editor, for example. Enjoy, and start becoming familiar with Turbo Pascal's object-oriented programming! 39 Doing More With Strings By Gerhard Hoogterp Fidonet 2:283/1.2 BitNet HOOGTERP@HENUT5 One of the things most programmers who switch from Basic to Pascal notice first is the limited number of procedures and functions Turbo Pascal has available to work with strings. Standard Pascal, not having strings at all is even worse. Actually it's been a long time since I programmed in Basic, but the need for some more advanced string routines was there when I was working on a text adventure. My string toolbox grew and became more advanced over the years, and so it evolved into what I present here. For my purposes, I defined a string as a list of tokens separated by a delimiter. A sentence like: This is a sentence, with seven words. ^ ^ ^ ^^ ^ ^ ^^end of string Contains seven tokens and 4 delimiters. At some places the delimiters are double, like the ', ' combination. The fourth delimiter is the end of string Which, although not explicitly used by the routines, is there anyhow. As the toolbox was intended for use with a natural language parser I couldn't put restrictions on the delimiters, or the way they should be used. So the first routine to write was Procedure SimplifyDelimiters( Var TokenList : String; Delimiters : String); It takes a string and changes double delimiters into single delimiters. So the ', ' combination would be replaced with only space. To use this function with the above example would be: Sentence:='This is a sentence, with seven words.'; SimplifyDelimiters(Sentence,' ,.'); 40 The second function needed was a way to separate the string into a head and a tail. To find the first token. This token was of course delimited by a delimiter again. Function FindToken(Var TokenList : String; Delimiters : String):String; When this function is used on the example sentence like Sentence:='This is a sentence, with seven words.' Token:=FindToken(Sentence,' ,.'); it will return the string 'This'. The delimiter is removed and the sentence becomes 'is a sentence, with seven words.' These two routines provide the user with a lot of power when, for example, reading configuration files. SimplifyDelimiters accepts a string in a user readable way and turns it into something more suitable for a computer, the FindToken takes the first token (the keyword in a CFG file) which can be converted to a number or a subrange type to use in a CASE Statement. To make life a little easier there are some functions in the toolbox that alter the complete string too. Of course there are the UpStr(Var TokenList : String) and DownStr(Var TokenList : String) functions to turn a string in all uppercase/lowercase, a SkipLeadingSpaces(Var TokenList : String) Which deletes the leading spaces (Spaces are almost always used as delimiters, so leaving them there would, even after a SimplifyDelimiters, always let the string start with an empty token) DeleteTrailingSpaces(Var TokenList : String) Which deletes the trailing spaces of a string for the same reason as the SkipLeadingSpaces function. That leaves only two procedures in the toolbox, which are useful in the original set up of the unit. DeleteNoise(Var TokenList : String; Noise : String); Deletes all the characters in the NOISE string from the TokenList. When the user is free to type what he likes (as in an 41 adventure game) there are always characters which shouldn't be there. DeleteNoise deletes those characters. It's even more useful when used together with the other left procedure ReplaceToken( Var TokenList : String; Tok1 : String; Tok2 : String); which is used to change a certain token to an other token or character.. For example, in a typical adventure sentence one could type: 'Take the blue sword and go to the north' In this sentence the token 'AND' is a delimiter. It separated two commands. So it would be useful to change the 'AND' in for example '&' which can be used by the FindToken to break the sentence in two pieces. Sentence:='TAKE THE BLUE SWORD AND GO TO THE NORTH'; ReplaceToken(Sentence, 'AND', '&'); would do the trick. There are also some words in the sentence that don't add any information to the command. Their only used because it's english. The words 'THE' and 'TO' can be removed from the sentence. So with Sentence:='Take the blue sword and go to the north'; ReplaceToken(Sentence,'THE',' '); ReplaceToken(Sentence,'TO',' '); ReplaceToken(Sentence,'AND','&'); SimplifyDelimiters(Sentence,' '); Results in: Sentence = 'Take blue sword & go north' which is more suitable for a computer than the original sentence. An other approach is to change all the "noise" words to a noise character and delete that character with the delete noise procedure. 42 Bits & Pieces Bits & Pieces is going to be a sort of gathering ground for small routines and things that readers or maybe even me will contribute to. It's basically quick and dirty routines that can't really make up a whole article. Individual authors and programmers will be given credit with their routines. This issues Bits & Pieces is going to be a bit scarce, having only one item, but I think it will give readers an idea of the kinds of things I'm looking for in it and maybe get them to contribute to it. Bits & Pieces #1 : Nathan S. Phillips +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Here is a procedure to display a number with commas (12345 becomes 12,345). I wrote this procedure some time ago for my "DSIZ - Directory Size Utility" program. The program displays the size of a directory (among many other things), and a display like "C:\FOO - 13245336 bytes" would be a little confusing - is that 1, 13, or 132 megs? But display the sizes with commas, such as "C:\FOO - 13,245,336 bytes", and the user can easily see how big the directory is. But enough history - on to how the procedure does what it does. The procedure takes the number you want to display with commas in a value parameter of type real. It uses the real type so that large values may be passed - only the integer portion of the number is used. Basically, it takes a number and either: 1. Displays the number if it doesn't need any commas (0-999) 2. Calls itself, passing the number divided by 1000, if the number needs a comma (greater than 999) When the procedure calls itself, the remaining instructions are "stacked", and executed when the call is complete. Therefore, the procedure keeps calling itself (without displaying anything) until it gets to the leftmost digit(s) that are displayed before the first comma. It displays that portion, and returns. Then the stacked instructions start executing, printing the commas and zero-padded numbers. When all of the stacked instructions have been executed, the entire number will have been printed. Like many recursive procedures, it is an elegant, compact solution to the problem. And, like many recursive procedures, it is difficult to follow, especially if you don't use recursion often. (I should say difficult for Pascal programmers to follow; Lisp programmers *think* recursively.) 43 procedure DisplayWithCommas(r : real); { displays 1024 as 1,024 } var t : real; begin { NON-RECURSIVE CASE if r is less than 1000 } if r < 1000 then { if this part is under a thousand, it doesn't } write(r:1:0) { need zero padding or commas, so just print it } { (This would be the 1 in 1,024) } else begin { RECURSIVE CASE - will need at least one comma } { call this procedure again } DisplayWithCommas(int(r/1000)); { with the same number, but } { without the last 3 digits } write(','); { display the comma } t := r - int(r/1000)*1000;{get the value of the last 3 digits} { (will be 24 for 1024) } if t < 100 then write('0');{pad with 0s so the routine will} if t < 10 then write('0'); { print 1,024 rather than 1,24 } write(t:1:0); {now just write the value of the last 3 digits} end; end; { procedure DisplayWithCommas } 44 Conclusion I like to do things in order. I wrote the introduction first thing, and I'm finishing off with the Conclusion, after everything else has been put togather. At this point, I'm speechless. I would like to thank all of the wonderful people who contributed to this issue and previous issues. We had a lot of article submitted this month. If people can keep sending them in, the Pascal NewsLetter might really gain some momentum. I hope you've enjoyed this issue as much as I have. I've enjoyed reading all of the articles and I've enjoyed putting them together for you. I've been blown away by all the contibuted articles. It's really been exciting to see the response. I am constantly getting suggestions, comments, and good criticism for the newsletter. I'm glad there are so many people out there who enjoy it. It makes it worth my time. I'm not really sure what I'm going to do for the fourth issue. I do have some ideas written down here and there, but nothing firm yet. I'm sure users will keep sending in the articles which will keep us alive. If I had to rely on myself for each issue, I can assure you they would only come out once every two months, and wouldn't be nearly as good. I think I'm going to be able to put new issues out about once every 3 or 4 weeks, which to me seems reasonable. If you'd like to see it come out more often, then pitch in and send in some articles. I'm still open to contributed articles. Please send them in. I ask only one thing more: Please do not send the documents in formatted. It is almost impossible to work with the documents which are justified, double-spaced, etc... Please send them in single- spaced and hard returns only at paragraph breaks. I'll take care of the rest once it's integrated into the newsletter. Again, I would like to thank everyone, and please keep the comments and suggestions coming, but more importantly, keep the article submissions comming!!! _Pete Davis Editor 45 The Pascal NewsLetter is Copyrighted by Pete Davis. Articles submitted by others are the property of those authors and are used with permission. They may not be treated seperately from this newsletter without the author's permission and thus maintain all distribution rules of the newsletter as a whole. It may be freely distributed in un-modified form, but no charge whatsoever may be incurred on the recipient. All code is provided 'as-is' with no guarantees whatsoever. The Pascal NewsLetter can be obtained from the following locations: GEnie : General Electric Network for Information Exchange. It is located in the IBMPC filelist. Simtel: Internet address: 26.2.0.74 or Simtel20.ARPA It is located in the directory. Programmer's Forum : This is the home of the PNL. Each issue can be found in the MAG directory from the main area. The number is on the title page. If you would like to receive back issues of PNL directly from me, send a diskette and $2.00 for shipping. Don't forget to include your address. Send your order to: Peter Davis 4851 Brandywine St. NW Washington, DC 20016 If you are a SysOp that will regularly carry PNL and would like to have your bulletin board listed as such, here, send me a message either by postal mail or at one of the electronic addresses given on the title page, with your bulletin board's name, phone number, and your name. 46 Distribution List The following is the phone numbers to bulletin boards known to carry PNL. If you would like your bulletin board's name and number added to or deleted from this list, please send me a message at one of my many addresses. I can not guarantee whether a listed board will have any particular issue, however. The Programmer's Forum ................. Phone: (202) 966-3647 The Programmer's Forum is the home of the PNL. The Bored .............................. Phone: (512) 303-0471 Classic City ........................... Phone: (404) 548-0726 Theive's World ......................... Phone: (713) 463-8053 Hippocampus ............................ Phone: (203) 484-4621 Rogers State College BBS ............... Phone: (918) 341-8982 The Foxtrot BBS ........................ Re-Locating Turbo City BBS ......................... Phone: (209) 599-7435 Austin IEMUG/MIDI BBS .................. Phone: (512) 528-0626 Laser Publishers ....................... Phone: (918) 438-2749 Fargo RBBS-PC .......................... Phone: (701) 293-5973