================================================== INDUCE.DOC ================================================== INDUCE Copyright 1986 - MicroExpert Systems Box 430 R.D. 2 Nassau, NY 12123 INDUCE implements the ID3 algorithm for the generation of rules from a data set as described in the article "Finding Knowledge in Data" in the November 1986 issue of BYTE. The program has been tested using Turbo Version 3.01A on an IBM PC. It has been run under both DOS 2.1 and Concurrent 4.1 . The source for this program is contained in two files, INDUCE.PAS and INDUCE.INC. The program produces one overlay file INDUCE.000 . INDUCE produces a knowledge base which can be used with MicroExpert. MicroExpert is an expert system shell written in Turbo Pascal for the IBM PC and Apple II. It is available for $49.95 and comes with complete source code. It can be order by writing to : McGraw-Hill Book Company P.O. Box 400 Hightstown, NJ 08520 Or calling 1-800-628-004 or in New York state 212/512-2999. We would be pleased to hear your comments, good or bad, or any applications and modifications of the program. Contact us at the above address or on BIX. Our id is bbt and we may be contacted via BIXmail or by leaving comments in the MicroExpert conference. - Bill and Bev Thompson Operation To start the program simply switch to the directory containing INDUCE.COM and INDUCE.000 and at the DOS prompt type INDUCE and press the ENTER key. The screen will clear and the message Example File (Press to quit.) : will appear. Type in the name of your example file and press the enter key. The file name should include the drive and path name if necessary. The default extension for example files is ".EX". The program will now read the example file. Error messages will be displayed on the screen. The program does not do very extensive error checking, so be sure to examine the example files and knowledge base to be sure that they make sense. Once the file has been read, the program will attempt to classify the example set. Each time an attempt is made to classify a partition of the example set, a "." is printed on the screen. The program is not particularly fast, so you will see the "."s crawl across the screen. You may see a "*" appear on the screen from time to time and then disappear. This indicates that garbage collection is in process. The program is attempting to reclaim memory which has been used, but is no longer accessible. When the classification process has been completed, the message Output the tree to what file (Press for screen) ? will appear. You may save the tree to a file or press to print it on the screen. The format of the tree is described in the BYTE article. If the size of the tree is such that its width exceeds 80 columns, it may not print properly. After displaying the tree on the screen, a message telling you to press any key to continue will be displayed. To print the tree on the printer enter "lst:" as the file name. Next, the program will display Output the rules to what file (Press for screen) ? Enter the name of the file which is to contain the rules. If this file is to be a MicroExpert knowledge base, be sure to include the extension ".KB" to the file name. The program will also write a series of prompts for the attributes. Finally the program will clear the screen and request a new example file. At this point you can enter a new example file or press to exit the program. Example Files Example files are simply Ascii text files which are created with a text editor. The program ignores blank lines and comments in the files. Comments begin with "(*" and end with "*)". A comment may extend over several lines. The first line in the file which is not a comment or a blank line must contain the attribute names. The format of this line is class name,attribute1,attribute2,..... The class name must come first, followed by the names of the attributes separated by commas. Leading and trailing blanks in attribute are not significant. Internal spaces are. Therefore, "dog and cat" is not the same as "dogandcat". The program is also case sensitive, so "Dog" is considered different from "dog". The program does not check for duplicate attributes, but of course, any knowledge base produced using duplicate attribute names is likely to be incorrect. Following the line containing the attribute names are one or more lines containing examples. Each example line contains a class value followed by a series of attribute values separated by commas. Each example must fit on one line. The general format is class value,value of attribute1,value of attribute2,..... The attributes must be in the same order as they are listed in the first line, although there is no way for the program to check on this. As with attribute and class names, internal spaces are significant, leading and trailing spaces are not. "don't care" values are indicated by a "*". The following is the contents of the file for the example set in the BYTE article: (* Example file for Byte Article 21-May-86 *) (* Copyright [c] 1986 MicroExpert Systems Box 430 RD 2 Nassau, NY 12123 *) (* Attributes *) profit ,age ,competition ,type (* Examples *) down ,old ,no ,software down ,midlife ,yes ,software up ,midlife ,no ,hardware down ,old ,no ,hardware up ,new ,no ,hardware up ,new ,no ,software up ,midlife ,no ,software up ,new ,yes ,software down ,midlife ,yes ,hardware down ,old ,yes ,software Numerical Attributes Numerical attributes are handled in the name manner as symbolic (non-numerical) attributes, except that ":number" is appended to the attribute name. ":number" is removed from the attribute name before printing and will not appear in either the tree or the knowledge base. Values for numeric attributes must be with in the range +/- 1.0E+37 to +/- 1.0E-37. The numbers may be entered in integer, real or floating point format. The following example set demonstrates the use of numerical attributes. There is a "don't care" value in the second example. (* Numerical Attribute Example file *) (* Copyright [c] 1986 MicroExpert Systems Box 430 RD 2 Nassau, NY 12123 *) (* Attributes *) profit ,age:number ,competition ,type (* Examples *) down ,5.0 ,no ,software down ,2.5 ,* ,software up ,2.5 ,no ,hardware down ,5 ,no ,hardware up ,1 ,no ,hardware up ,1 ,no ,software up ,2.5 ,no ,software up ,1 ,yes ,software down ,2 ,yes ,hardware down ,5 ,yes ,software ================================================== INDUCE.PAS ================================================== {.IN+} {.PW132} (*$V-,R+,B- *) PROGRAM induce ; (* Copyright 1986 - MicroExpert Systems Box 430 R.D. 2 Nassau, NY 12123 *) (* Induce implements the ID3 algorithm for the generation of rules from a data set as described in the BYTE article "Finding Knowledge in Data". This program has been tested using Turbo ver 3.01A on an IBM PC. It has been run under both DOS 2.1 and Concurrent 4.1 . The source for this program is contained in two files, INDUCE.PAS and INDUCE.INC. The program produces one overlay file INDUCE.000 . INDUCE produces a knowledge base which can be used with MicroExpert. MicroExpert is an expert system shell written in Turbo Pascal for the IBM PC and Apple II. It is available for $49.95 and comes with complete source code. It can be order by writing to : McGraw-Hill Book Company P.O. Box 400 Hightstown, NJ 08520 Or calling 1-800-628-004 or in New York state 212/512-2999. We would be pleased to hear your comments, good or bad, or any applications and modifications of the program. Contact us at the above address or on BIX. Our id is bbt and we may be contacted via BIXmail or by leaving comments in the MicroExpert conference. Bill and Bev Thompson *) CONST ln2 = 0.69314718 ; limit = 1.0E-20 ; debug = false ; back_space = ^H ; tab = ^I ; eof_mark = ^Z ; esc = #27 ; quote_char = #39 ; left_arrow = #75 ; end_key = #79 ; del_line = ^X ; return = ^M ; bell = ^G ; TYPE counter = 0 .. maxint ; string80 = string[80] ; string132 = string[132] ; string255 = string[255] ; text_file = text ; char_set = SET OF char ; node_type = (cons_node,symbol,number,free_node) ; node_ptr = ^node ; node = RECORD in_use : boolean ; CASE tag : node_type OF cons_node : (tail_ptr : node_ptr ; head_ptr : node_ptr) ; symbol : (string_data : string80) ; number : (num_data : real) ; free_node : (next_free : node_ptr ; block_cnt : counter) ; END ; (* node is the basic allocation unit for lists. The fields are used as follows: in_use - in_use = false tells the garbage collector that this node is available for re-use. tag - which kind of node this is. cons_node - cons_nodes consist of two pointers. one to the head (first item) the other to the rest of the list. They are the "glue" which holds the list together. The list (A B C) would be stored as ------- -------- -------- | .| . |-----> | .| . |------> | .| . |---> NIL --|----- --|------ --|----- | | | V V V A B C The boxes are the cons nodes, the first part of the box holds the head pointer, then second contains the tail. symbol - holds string values, we don't actually use the entire 80 characters in most cases. number - contains a real number. free_node - the garbage collector gathers all unused nodes and puts them on a free list. It also compacts the free space into contiguous blocks. next_free points to the next free block. block_cnt contains a count of the number of contiguous 8 byte free blocks which follow this one. *) VAR example_file : text_file ; line : string255 ; c_list,examples,attrib_list,saved_list,initial_heap,free : node_ptr ; total_free : real ; no_of_cols : counter ; (* The important globals are: example_file - text file containing the original example set. See the documentation file for its format. line - line buffer for reading in the text file c_list - the classification tree examples - the list of examples attrib_list - list of attribute names and their values saved_list - list of all items that absolutely must be saved if garbage collection occurs. Usually has at least the examples and attrib_list attcahed to it. initial_heap - the value of the heap pointer at the start of the program. used by the garbage collector free - the list of free nodes. total_free - total number of free blocks on the free list. no_of_cols - the total number of attributes + the class attribute in the example set. *) (*$I induce.inc *) PROCEDURE read_from_file(VAR f : text_file) ; (* Read a line form file f and store it in the global variable, line. It ignores blank lines and comments. When the end of file is reached eof_mark is returned. *) CONST in_comment : boolean = false ; (* static *) PROCEDURE read_a_line ; BEGIN (*$I- *) readln(f,line) ; (*$I+ *) IF ioresult <> 0 THEN line := eof_mark ELSE IF pos('(*',line) > 0 THEN IF pos('*)',line) > 0 THEN delete(line,pos('(*',line),pos('*)',line) - pos('(*',line) + 2) ELSE BEGIN in_comment := true ; line := '' ; END ; END ; (* read_a_line *) BEGIN line := '' ; IF eof(f) THEN line := eof_mark ELSE BEGIN read_a_line ; IF in_comment THEN IF pos('*)',line) > 0 THEN BEGIN delete(line,1,pos('*)',line) + 1) ; in_comment := false ; END ELSE read_from_file(f) ; END ; strip_leading_blanks(line) ; strip_trailing_blanks(line) ; IF line = '' THEN read_from_file(f) ; END ; (* read_from_file *) OVERLAY PROCEDURE expand(example_list : node_ptr ; VAR new_example_list : node_ptr) ; (* Expand "don't care" values into values from attrib_list. example_list - unexpanded example set new_example_list - expanded set *) PROCEDURE dup_and_copy(list : node_ptr) ; (* This routine creates a new version of the current row, pointed to by list. If it finds a regular attribute value, it just appends the value to the row it is constructing. If it finds a '*', indicating a "don't care" value, it call copy_to_new_list to expand the value and attach the new rows to new_example_list. Notice that we attach anything we don't want to be trashed by the garbage collector to the head of saved_list and remove it at the end of the routine. copy_to_new_list saves new_list because it calls dup_and_copy and that routine might initiate garbage collection. *) VAR new_list,attr_ptr : node_ptr ; copied : boolean ; PROCEDURE copy_to_new_list ; (* This routine does the actual expansion. It attaches a value for the attribute, pointed to by p, to the row that has been constructed so far and attaches the rest of the list to the end of the row. It calls dup_and_copy to expand any more *'s in the row and finally attach the row to the new_example_list *) VAR p,new_row : node_ptr ; BEGIN saved_list := cons(new_list,saved_list) ; copied := true ; p := tail(head(attr_ptr)) ; WHILE p <> NIL DO BEGIN new_row := append_list(new_list,cons(head(p),tail(list))) ; dup_and_copy(new_row) ; p := tail(p) ; END ; saved_list := tail(saved_list) ; END ; (* copy_to_new_list *) BEGIN saved_list := cons(list,saved_list) ; test_memory ; new_list := NIL ; attr_ptr := attrib_list ; copied := false ; WHILE (list <> NIL) AND (NOT copied) DO IF string_val(head(list)) = '*' THEN copy_to_new_list ELSE BEGIN new_list := append_list(new_list,cons(head(list),NIL)) ; list := tail(list) ; attr_ptr := tail(attr_ptr) ; END ; IF NOT copied THEN new_example_list := append_list(new_example_list,cons(new_list,NIL)) ; saved_list := cons(new_example_list,tail(saved_list)) ; END ; (* dup_and_copy *) BEGIN new_example_list := NIL ; WHILE example_list <> NIL DO BEGIN dup_and_copy(head(example_list)) ; example_list := tail(example_list) ; END ; END ; (* expand *) OVERLAY FUNCTION conflicts(example_list : node_ptr) : boolean ; (* Search for conflicts by using match_list to compare each row against the rows which follow it in the example list. conflicts returns true if a match is found. *) VAR p : node_ptr ; found_match : boolean ; PROCEDURE conflict_message ; BEGIN writeln ; writeln('A conflict exists between rows:') ; writeln ; print_list(head(example_list)) ; writeln ; print_list(head(p)) ; writeln ; writeln('Processing cannot continue.') ; END ; (* conflict_message *) BEGIN found_match := false ; WHILE (example_list <> NIL) AND (NOT found_match) DO BEGIN p := tail(example_list) ; WHILE (p <> NIL) AND (NOT found_match) DO IF match_lists(tail(head(example_list)),tail(head(p))) THEN found_match := true ELSE p := tail(p) ; IF NOT found_match THEN example_list := tail(example_list) ; END ; IF found_match THEN conflict_message ; conflicts := found_match ; END ; (* conflicts *) OVERLAY PROCEDURE build_table ; (* Read the example file and build the attrib_list and examples. The format of these two lists is described in the BYTE article mentioned at the beginning of the program. This routine doesn't do much error checking, so be careful with your example files. *) VAR new_row : node_ptr ; token : string80 ; PROCEDURE scan ; (* Get a single token from the input line. This procedure strips leading and trailing blanks and tabs, but interior spaces are sigificant. A token is any string between the first non-space character and a comma or end of line. Case is significant in tokens, 'Cat' and 'cat' will be treated as different values by the program. *) VAR comma_pos : byte ; BEGIN strip_leading_blanks(line) ; IF line = '' THEN token := '' ELSE BEGIN comma_pos := pos(',',line) ; IF comma_pos > 0 THEN BEGIN token := copy(line,1,comma_pos - 1) ; delete(line,1,comma_pos) ; END ELSE BEGIN token := line ; line := '' ; END ; IF token = '' THEN token := '*' ; strip_trailing_blanks(token) ; END ; END ; (* scan *) PROCEDURE build_a_row ; (* Builds an example row. Symbolic and numerical attributes are handled differently. Input lines are read one token at a time and storage is allocated for the new token. The attrib_list is examined to see if the new value appears on the list of values for that attribute. If it does not, the value is added to the list. Symbolic values are added to the end of the list of values for the attribute, numerical values are stored in order. Once the new row is constructed it is appended to the example set. *) VAR at_list,row_list,token_ptr : node_ptr ; PROCEDURE length_error ; (* Signal an error, probably a missing value. The row in question will not be included in the example set, but the attribute list may be damaged, so don't trust results after and error. *) BEGIN writeln ; writeln('Missing attribute in row:') ; print_list(row_list) ; writeln ; writeln ; wait ; END ; (* length_error *) PROCEDURE add_value ; (* Add a new value to the attribue list. The variable attrib_list keeps track of the current column as the row is scanned. If token was found to already be on the attrib_list, head(attrib_list) is appended to at_list. If the token is a new value, it is added to the list at the head of attrib_list, and then head(attrib_list) is appended to at_list. After reading the entire row from the file, attrib_list is set to point to at_list. This way attrib_list is reconstructed for each row. *) PROCEDURE insert_number ; (* Insert a number into the attribute list. The list of values for numerical attributes is maintained in order. This is done by comparing the value of the token against the other items on the list. As the comparison is done, the values are copied to new_list. When a value is found that is greater than the token value or the end of the list is reached, the token is appened to new_list and then the reaming values on the old list are appended to new_list. Finally new_list is appended to at_list. All of this appending produces lots of garbage. *) VAR new_list,p : node_ptr ; r : real ; inserted : boolean ; PROCEDURE build_new_list ; (* This routine does the actual insetion described above. *) BEGIN WHILE (p <> NIL) AND (NOT inserted) DO BEGIN IF abs(r - num_val(head(p))) < limit THEN BEGIN inserted := true ; new_list := append_list(new_list,p) ; END ELSE IF r > num_val(head(p)) THEN BEGIN new_list := append_list(new_list,cons(head(p),NIL)) ; p := tail(p) ; END ELSE BEGIN new_list := append_list(new_list,append_list( cons(token_ptr,NIL),p)) ; inserted := true ; END ; END ; END ; (* build_new_list *) BEGIN r := num_val(token_ptr) ; inserted := false ; new_list := cons(head(head(attrib_list)),NIL) ; p := tail(head(attrib_list)) ; build_new_list ; IF (p = NIL) AND (NOT inserted) THEN new_list := append_list(new_list,cons(token_ptr,NIL)) ; at_list := append_list(at_list,cons(new_list,NIL)) ; END ; (* insert_number *) BEGIN IF tag_value(token_ptr) = number THEN insert_number ELSE at_list := append_list(at_list, cons(append_list(head(attrib_list), cons(token_ptr,NIL)), NIL)) ; END ; (* add_value *) BEGIN saved_list := cons(examples,attrib_list) ; test_memory ; at_list := NIL ; row_list := NIL ; scan ; WHILE token <> '' DO BEGIN IF pos(':NUMBER',toupper(string_val(head(head(attrib_list))))) > 0 THEN token_ptr := alloc_num(toreal(token)) ELSE token_ptr := alloc_str(token) ; IF (NOT on_list(token,head(attrib_list))) AND (token <> '*') THEN add_value ELSE at_list := append_list(at_list,cons(head(attrib_list),NIL)) ; row_list := append_list(row_list,cons(token_ptr,NIL)) ; attrib_list := tail(attrib_list) ; scan ; END ; attrib_list := at_list ; IF list_length(row_list) = no_of_cols THEN examples := append_list(examples,cons(row_list,NIL)) ELSE length_error ; END ; (* build_a_row *) PROCEDURE build_attrib_list ; (* constructs the initial attrib_list from the first row in the file. Initially the attrib_list is simply a list of the attribute names, build_a_row adds the values to it. This routine also counts the number of columns (attributes) in the table. *) BEGIN attrib_list := NIL ; no_of_cols := 0 ; scan ; WHILE token <> '' DO BEGIN attrib_list := append_list(attrib_list,cons(cons(alloc_str(token),NIL), NIL)) ; no_of_cols := no_of_cols + 1 ; scan ; END ; END ; (* build_attrib_list *) BEGIN examples := NIL ; line := '' ; read_from_file(example_file) ; IF line <> eof_mark THEN build_attrib_list ; read_from_file(example_file) ; WHILE line <> eof_mark DO BEGIN build_a_row ; read_from_file(example_file) ; END ; END ; (* build_table *) OVERLAY FUNCTION classify_it : node_ptr ; (* is an overlay function which calls classify. We do it this way to avoid swapping due to recursion. *) FUNCTION classify(example_list,chosen_list : node_ptr) : node_ptr ; (* This is the main processing routine of the program. It is passed two lists, a list of rows, example_list and a list of attributes already chosen. The second list is simply for convenience. That way we don't have to calculate the entropy for attribute which can no longer contribute to splitting example_set. classify returns a pointer to the classification tree built from the example_set. If the example_list passed to it contains only a single class value, classify returns the class_name (attribute name of the first column) and the class value. Variables: split_elem - the column (attribute number) to split on classify_list - a temporary list to hold the tree split_value - for numerical attributes. It contains the value which produces the best numerical split. classify prints a dot on the screen each time it is entered to show you that the program really hasn't died. *) VAR split_elem : counter ; classify_list : node_ptr ; split_value : real ; PROCEDURE find_split(VAR split_elem : counter ; VAR min_split_value : real) ; (* finds the best attribute to split on. It returns the column number on which to split and for numerical attribute, the value which produces the best split. For each active attribute it constructs a class_list. the class_list has the following format: ( (attribute value #1 (class1 count) (class2 count) ....) (attribute value #2 (class1 count) (class2 count) ....) .....) The counts are the number of times each class appears in a row with a particular value of the attribute. This list is used to calculate the entropy of the attribute. *) VAR i : counter ; attrib : node_ptr ; ent,min_entropy,split_value : real ; FUNCTION entropy(list : node_ptr ; cases : counter) : real ; (* list is a class list. cases is the number of examples under consideration. This routine calculates the entropy H(C|A) from the class list. *) VAR sum,sum1,sum2,r : real ; p : node_ptr ; FUNCTION log2(x : real) : real ; BEGIN IF abs(x) < limit THEN log2 := 0.0 ELSE log2 := ln(x) / ln2 ; END ; (* log2 *) BEGIN sum := 0.0 ; WHILE list <> NIL DO BEGIN sum1 := 0.0 ; sum2 := 0.0 ; p := tail(head(list)) ; WHILE p <> NIL DO BEGIN r := num_val(head(tail(head(p)))) ; sum1 := sum1 + r * log2(r) ; sum2 := sum2 + r ; p := tail(p) ; END ; sum := sum + (sum2 * log2(sum2)) - sum1 ; list := tail(list) ; END ; entropy := sum / cases ; END ; (* entropy *) PROCEDURE numeric_entropy(elem_no : counter ; VAR num_entropy,num_split_value : real) ; (* Find the best split for a numeric attribute. elem_no is the column we are working on. num_entropy is the best entropy for this attribute and num_split_value is the split which gives that value. In addition to the class list, this routine produces an ordered list of the values for this attribute, called num_list. This list is used in making the splits. Each split is half way between successive values on the num_list. The entropy is calculated for each split. *) VAR class_list,sp,num_list : node_ptr ; sp_val,num_ent : real ; total_cases : counter ; PROCEDURE make_num_list ; (* constructs num_list. This is essentially the same routine as insert_number in build_table *) VAR new_list,p,q : node_ptr ; r : real ; inserted : boolean ; PROCEDURE add_to_new_list ; BEGIN WHILE (p <> NIL) AND (NOT inserted) DO BEGIN IF abs(r - num_val(head(p))) < limit THEN BEGIN inserted := true ; new_list := append_list(new_list,p) ; END ELSE IF r > num_val(head(p)) THEN BEGIN new_list := append_list(new_list,cons(head(p),NIL)) ; p := tail(p) ; END ELSE BEGIN new_list := append_list(new_list,append_list( cons(alloc_num(r),NIL),p)) ; inserted := true ; END ; END ; END ; (* add_to_new_list *) BEGIN test_memory ; num_list := NIL ; q := example_list ; WHILE q <> NIL DO BEGIN r := num_val(head(element(head(q),elem_no))) ; new_list := NIL ; p := num_list ; inserted := false ; add_to_new_list ; IF (p = NIL) AND (NOT inserted) THEN new_list := append_list(new_list,cons(alloc_num(r),NIL)) ; num_list := new_list ; q := tail(q) ; END ; END ; (* make_num_list *) PROCEDURE make_numeric_class_list(v : real) ; (* builds the class list. v is the value to split on. The class_list contains lists for two ranges < v and >= v. The list has the format: ( (< v (class1 count) (class2 count) .....) (>= v (class1 count) (class2 count) .....)) *) VAR temp_list,p : node_ptr ; v_str : string80 ; BEGIN str(v,v_str) ; temp_list := NIL ; p := tail(head(attrib_list)) ; WHILE p <> NIL DO BEGIN temp_list := append_list(temp_list,cons(cons(head(p), cons(alloc_num(0.0),NIL)),NIL)) ; p := tail(p) ; END ; class_list := cons(cons(alloc_str(concat('< ',v_str)),temp_list), cons(cons(alloc_str(concat('>= ',v_str)), copy_list(temp_list)),NIL)) ; END ; (* make_numeric_class_list *) PROCEDURE count_numeric_classes(v : real ; elem_no : counter) ; (* count the classes for each range. It reads the example list, extracts the value for the attribute, searches the class list and increments the appropriate class value in the list. v is the split value. *) VAR px,py : node_ptr ; PROCEDURE numeric_increment(list : node_ptr ; attr_v,atv : string80) ; (* search list (class_list) and compare attr_v to v, the split_value. atv is the class_value. Once we find the sub-list with the proper range we search its tail for atv to increment the class count. *) PROCEDURE do_increment(v_list : node_ptr) ; VAR p,q : node_ptr ; BEGIN q := tail(v_list) ; WHILE q <> NIL DO IF string_val(head(head(q))) = atv THEN BEGIN p := head(tail(head(q))) ; IF tag_value(p) = number THEN p^.num_data := p^.num_data + 1.0 ; total_cases := total_cases + 1 ; q := NIL ; END ELSE q := tail(q) ; END ; (* do_increment *) BEGIN IF toreal(attr_v) < v THEN do_increment(head(list)) ELSE do_increment(head(tail(list))) ; END ; (* numeric_increment *) BEGIN total_cases := 0 ; px := example_list ; WHILE px <> NIL DO BEGIN py := head(px) ; numeric_increment(class_list,string_val(head(element(py,elem_no))), string_val(head(py))) ; px := tail(px) ; END ; END ; (* count_numeric_classes *) BEGIN num_entropy := 1.0E+37 ; make_num_list ; sp := tail(num_list) ; saved_list := cons(num_list,saved_list) ; WHILE sp <> NIL DO BEGIN test_memory ; sp_val := num_val(head(num_list)) + ((num_val(head(sp)) - num_val(head(num_list))) / 2.0) ; make_numeric_class_list(sp_val) ; count_numeric_classes(sp_val,elem_no) ; num_ent := entropy(class_list,total_cases) ; IF num_ent < num_entropy THEN BEGIN num_entropy := num_ent ; num_split_value := sp_val ; END ; num_list := sp ; sp := tail(sp) ; END ; saved_list := tail(saved_list) ; END ; (* numeric_entropy *) PROCEDURE symbol_entropy(val_list : node_ptr ; elem_no : counter ; VAR sym_ent,sym_split_val : real) ; (* Find the entropy for a symbolic attribute. val_list is the list of possible values for this attribute from the attrib_list. elem_no is the column number, sym_ent is the entropy for this attribute. sym_split_value is always 0. This routine constructs a class list as described above and counts the classes for each value of the attribute as in the numeric case, only there is no range splitting. Symbolic attributes can result in mult-way partitions of the example_list, numerical attributes always produce binary splits. *) VAR class_list : node_ptr ; total_cases : counter ; PROCEDURE make_class_list(a_list : node_ptr) ; (* builds the initial class list. See above comments for format. *) VAR temp_list,p : node_ptr ; BEGIN WHILE a_list <> NIL DO BEGIN temp_list := cons(head(a_list),NIL) ; p := tail(head(attrib_list)) ; WHILE p <> NIL DO BEGIN temp_list := append_list(temp_list,cons(cons(head(p), cons(alloc_num(0.0),NIL)),NIL)) ; p := tail(p) ; END ; class_list := append_list(class_list,cons(temp_list,NIL)) ; a_list := tail(a_list) ; END ; END ; (* make_class_list *) PROCEDURE count_classes(elem_no : counter) ; (* traverses the example_list and counts class values. *) VAR px,py : node_ptr ; PROCEDURE increment(list : node_ptr ; attr,v : string80) ; (* search list (class_list) and compare attr to the head of each sub-list. v is the class_value. Once we find the sub-list with the proper range we search its tail for v to increment the class count. *) VAR p,q : node_ptr ; BEGIN WHILE list <> NIL DO IF string_val(head(head(list))) = attr THEN BEGIN q := tail(head(list)) ; WHILE q <> NIL DO IF string_val(head(head(q))) = v THEN BEGIN p := head(tail(head(q))) ; IF tag_value(p) = number THEN p^.num_data := p^.num_data + 1.0 ; total_cases := total_cases + 1 ; list := NIL ; q := NIL ; END ELSE q := tail(q) ; END ELSE list := tail(list) ; END ; (* increment *) BEGIN total_cases := 0 ; px := example_list ; WHILE px <> NIL DO BEGIN py := head(px) ; increment(class_list,string_val(head(element(py,elem_no))), string_val(head(py))) ; px := tail(px) ; END ; END ; (* count_classes *) BEGIN class_list := NIL ; make_class_list(val_list) ; count_classes(elem_no) ; sym_ent := entropy(class_list,total_cases) ; sym_split_val := 0.0 END ; (* symbol_entropy *) BEGIN min_entropy := 1.0E+37 ; FOR i := 2 TO no_of_cols DO BEGIN test_memory ; attrib := head(element(attrib_list,i)) ; IF NOT on_list(string_val(head(attrib)),chosen_list) THEN BEGIN IF pos(':NUMBER',toupper(string_val(head(attrib)))) > 0 THEN numeric_entropy(i,ent,split_value) ELSE symbol_entropy(tail(attrib),i,ent,split_value) ; IF ent < min_entropy THEN BEGIN min_entropy := ent ; split_elem := i ; min_split_value := split_value ; END ; END ; END ; END ; (* find_split *) FUNCTION split(elem_no : counter ; split_val : real) : node_ptr ; (* This routine splits the example_list into sets which contain a single value of the split attribute. elem_no is the column on which to split. split_val is the split value for numerical attributes. split_item points to the attribute's entry in the attribute list. split_list is the tree which is returned by split. Its format is (attribute_name (value1 classify(partition with attribute = value1) (value2 classify(partition with attribute = value2) ....... ) *) VAR split_list,split_item,new_chosen : node_ptr ; PROCEDURE numeric_split ; (* Splitting on a numerical attribute splits the examples into two groups, those with values < split_value and those with values >= split_value. new_list1 and new_list2 are the example sets for the two categories. It returns a split_list as follows: (attribute_name ('< split_val' classify(all examples with attribute value < split_val) ('>= split_val' classify(all examples with attribute value >= split_val)) Notice all the lists placed on the saved_list. These are the items that we must retain should any of the calls to classify invoke garbage collection *) VAR new_list1,new_list2,q,valu : node_ptr ; split_str : string80 ; BEGIN str(split_val,split_str) ; valu := cons(alloc_str(concat('< ',split_str)), cons(alloc_str(concat('>= ',split_str)),NIL)) ; q := example_list ; new_list1 := NIL ; new_list2 := NIL ; WHILE q <> NIL DO BEGIN IF num_val(head(element(head(q),elem_no))) < split_val THEN new_list1 := append_list(new_list1,cons(head(q),NIL)) ELSE new_list2 := append_list(new_list2,cons(head(q),NIL)) ; q := tail(q) ; END ; saved_list := cons(split_list,cons(valu,cons(new_list2,saved_list))) ; split_list := append_list(split_list, cons(cons(head(valu), cons(classify(new_list1,chosen_list),NIL)),NIL)) ; saved_list := cons(split_list,tail(saved_list)) ; split_list := append_list(split_list, cons(cons(head(tail(valu)), cons(classify(new_list2,chosen_list),NIL)),NIL)) ; saved_list := tail(tail(tail(saved_list))) ; END ; (* numeric_split *) PROCEDURE symbol_split ; (* performs the split for symbolic attributes. For each value of the attribute, it searches the example list for matches and attaches examples with a match to new_example_list. If it finds any matches it appends the value and the result of classifying the new_example_list to split_list. This is a very inefficient way of doing this. It would be better to sort the example_list using column elem_no as a key. *) VAR valu,q,new_example_list : node_ptr ; BEGIN valu := tail(split_item) ; WHILE valu <> NIL DO BEGIN q := example_list ; new_example_list := NIL ; WHILE q <> NIL DO BEGIN IF string_val(head(valu)) = string_val(head(element(head(q),elem_no))) THEN new_example_list := append_list(new_example_list,cons(head(q),NIL)) ; q := tail(q) ; END ; IF new_example_list <> NIL THEN BEGIN saved_list := cons(split_list,saved_list) ; split_list := append_list(split_list, cons(cons(head(valu), cons(classify(new_example_list,new_chosen),NIL)), NIL)) ; saved_list := tail(saved_list) ; END ; valu := tail(valu) ; END ; END ; (* symbol_split *) BEGIN split_item := head(element(attrib_list,elem_no)) ; new_chosen := cons(head(split_item),chosen_list) ; split_list := cons(head(split_item),NIL) ; IF pos(':NUMBER',toupper(string_val(head(split_item)))) > 0 THEN numeric_split ELSE symbol_split ; split := split_list ; END ; (* split *) FUNCTION single_class : boolean ; (* returns true if the example_list contains only a single class value. *) VAR first_val : string80 ; p : node_ptr ; more_than_one : boolean ; BEGIN first_val := string_val(head(head(example_list))) ; more_than_one := false ; p := tail(example_list) ; WHILE (p <> NIL) AND (NOT more_than_one) DO IF string_val(head(head(p))) <> first_val THEN more_than_one := true ELSE p := tail(p) ; single_class := NOT more_than_one ; END ; (* single_class *) BEGIN write('.') ; split_elem := 0 ; saved_list := cons(chosen_list,cons(example_list,saved_list)) ; IF NOT single_class THEN find_split(split_elem,split_value) ; IF split_elem = 0 THEN classify_list := cons(head(head(attrib_list)), cons(cons(head(head(example_list)),NIL),NIL)) ELSE classify_list := split(split_elem,split_value) ; saved_list := append_list(tail(tail(saved_list)),cons(classify_list,NIL)) ; classify := classify_list ; END ; (* classify *) BEGIN classify_it := classify(examples,NIL) ; END ; (* classify_it *) OVERLAY PROCEDURE print_rule_list(list : node_ptr) ; (* This routine transforms the tree into a set of IF/THEN statements and writes them to a file. It produces a knowledge base for MicroExpert [c], if you want to produces rules for another shell, this routine will have to be modified. *) VAR rule_count : counter ; rule_file : text_file ; file_name : string80 ; used_attribs : node_ptr ; PROCEDURE print_rule(tree,rule_list : node_ptr) ; (* Do a depth first traversal of tree. On entry rule_list contains a list of attribute value pairs. When tree is finally NIL, i.e. a terminal node of the tree has been encountered, the rule_list is printed. If entered with a non-NIL tree, the routine creates a new attribute value pair, attaches it to rule_list and explores further down the tree. It also attaches the attribute names to used_attribs so that they can be used to generate prompts. *) VAR p : node_ptr ; PROCEDURE print_the_rule(list : node_ptr) ; (* Prints the rule_list, with rules formatted for MicroExpert. *) VAR s : string80 ; PROCEDURE write_compare ; VAR comp_str : string[2] ; FUNCTION quote(w : string80) : string80 ; BEGIN quote := '''' + w + '''' ; END ; (* quote *) BEGIN comp_str := '' ; WHILE s[1] <> ' ' DO BEGIN comp_str := comp_str + s[1] ; delete(s,1,1) ; END ; strip_leading_blanks(s) ; strip_trailing_blanks(s) ; writeln(rule_file,'function compare(',attrib_value(head(head(list))), ',',quote(comp_str),',',quote(s),')') ; END ; (* write_compare *) BEGIN writeln(rule_file,rule_count) ; rule_count := rule_count + 1 ; write(rule_file,'If ') ; WHILE list <> NIL DO BEGIN s := string_val(head(tail(head(list)))) ; IF s[1] IN ['<','>'] THEN write_compare ELSE writeln(rule_file,attrib_value(head(head(list))),' is ',s) ; list := tail(list) ; IF list <> NIL THEN IF tail(list) = NIL THEN write(rule_file,'then ') ELSE write(rule_file,'and ') ; END ; writeln(rule_file,'.') ; writeln(rule_file) ; END ; (* print_the_rule *) BEGIN IF tree = NIL THEN print_the_rule(rule_list) ELSE BEGIN IF head(tree) <> head(head(attrib_list)) THEN IF NOT on_list(string_val(head(tree)),used_attribs) THEN used_attribs := cons(head(tree),used_attribs) ; p := tail(tree) ; WHILE p <> NIL DO BEGIN print_rule(head(tail(head(p))), append_list(rule_list,cons(cons(head(tree), cons(head(head(p)),NIL)), NIL))) ; p := tail(p) ; END ; END ; END ; (* print_rule *) PROCEDURE print_prompts ; (* This routine traverses the attribute list and writes a prompt for each attribute on the list. MicroExpert does not automatically generate prompts, so this is necessary. The format of the questions may seem dumb. For a working knowledge base, you will want to edit the prompts and add translations. *) VAR q : node_ptr ; BEGIN q := used_attribs ; WHILE q <> NIL DO BEGIN writeln(rule_file) ; IF pos(':NUMBER',toupper(string_val(head(q)))) > 0 THEN writeln(rule_file,'Numeric prompt ',attrib_value(head(q))) ELSE writeln(rule_file,'Prompt ',attrib_value(head(q))) ; writeln(rule_file,'What is the value of ',attrib_value(head(q)),' ?') ; writeln(rule_file,'.') ; q := tail(q) ; END ; END ; (* print_prompts *) BEGIN writeln ; write('Output the rules to what file (Press for screen.) ? ') ; readln(file_name) ; strip_leading_blanks(file_name) ; IF file_name = '' THEN file_name := 'con:' ; assign(rule_file,file_name) ; rewrite(rule_file) ; writeln(rule_file) ; rule_count := 1 ; used_attribs := NIL ; print_rule(list,NIL) ; print_prompts ; IF is_console(rule_file) THEN wait ; close(rule_file) ; END ; (* print_rule_list *) OVERLAY PROCEDURE print_tree(list : node_ptr) ; (* Print the tree. This is really just a pretty print routine, which indents each sub_list. *) VAR indent_level : counter ; tree_file : text_file ; file_name : string80 ; PROCEDURE print_the_tree(tree : node_ptr ; VAR indent : counter) ; VAR p : node_ptr ; BEGIN IF tree <> NIL THEN CASE tree^.tag OF number, symbol : BEGIN write(tree_file,attrib_value(tree),' ') ; indent := indent + length(attrib_value(tree)) + 1 ; END ; cons_node : BEGIN write(tree_file,'(') ; indent := indent + 1 ; print_the_tree(head(tree),indent) ; p := tail(tree) ; WHILE p <> NIL DO BEGIN print_the_tree(head(p),indent) ; IF list_length(p) > 1 THEN BEGIN writeln(tree_file) ; write(tree_file,' ' : indent) ; END ; p := tail(p) ; END ; indent := indent - length(attrib_value(head(tree))) - 2 ; write(tree_file,') ') ; END ; END ; END ; (* print_the_tree *) BEGIN writeln ; write('Output the tree to what file (Press for screen.) ? ') ; readln(file_name) ; strip_leading_blanks(file_name) ; IF file_name = '' THEN file_name := 'con:' ; assign(tree_file,file_name) ; rewrite(tree_file) ; writeln(tree_file) ; indent_level := 0 ; print_the_tree(list,indent_level) ; writeln(tree_file) ; writeln(tree_file) ; IF is_console(tree_file) THEN wait ; close(tree_file) ; END ; (* print_tree *) OVERLAY FUNCTION got_file : boolean ; (* asks for an example file name and tries to open it. If it can't open the file, it complains and asks for a new file *) VAR example_name : string80 ; BEGIN writeln ; write('Example File (Press to quit.) : ') ; readln(example_name) ; IF example_name = '' THEN got_file := false ELSE BEGIN IF pos('.',example_name) = 0 THEN example_name := concat(example_name,'.EX') ; IF open(example_file,example_name) THEN got_file := true ELSE BEGIN writeln ; writeln(toupper(example_name),' could not be found.') ; writeln ; got_file := got_file ; END ; END ; END ; (* got_file *) BEGIN free := NIL ; initial_heap := HeapPtr ; total_free := 0.0 ; clrscr ; WHILE got_file DO BEGIN build_table ; close(example_file) ; IF NOT conflicts(examples) THEN BEGIN saved_list := cons(attrib_list,examples) ; expand(examples,examples) ; writeln ; saved_list := cons(attrib_list,examples) ; c_list := classify_it ; saved_list := cons(c_list,attrib_list) ; writeln ; test_memory ; print_tree(c_list) ; writeln ; test_memory ; writeln ; print_rule_list(c_list) ; clrscr ; END ; END ; END. ================================================== INDUCE.INC ================================================== (* ---------------------------------------------------------------------- Utility Routines ---------------------------------------------------------------------- *) FUNCTION open(VAR f : text_file ; f_name : string80) : boolean ; (* open a file - returns true if the file exists and was opened properly f - file pointer f_name - external name of the file *) BEGIN assign(f,f_name) ; (*$I- *) reset(f) ; (*$I+ *) open := (ioresult = 0) ; END ; (* open *) FUNCTION is_console(VAR f : text_file) : boolean ; (* return true if f is open on the system console for details of fibs and fib_ptrs see the Turbo Pascal ver 3.0 reference manual chapter 20. This should work under CP/M-86 or 80, but we haven't tried it. *) TYPE fib = ARRAY [0 .. 75] OF byte ; VAR fib_ptr : ^fib ; dev_type : byte ; BEGIN fib_ptr := addr(f) ; dev_type := fib_ptr^[2] AND $07 ; is_console := (dev_type = 1) OR (dev_type = 2) ; END ; (* is_console *) PROCEDURE strip_leading_blanks(VAR s : string80) ; BEGIN IF length(s) > 0 THEN IF (s[1] = ' ') OR (s[1] = tab) THEN BEGIN delete(s,1,1) ; strip_leading_blanks(s) ; END ; END ; (* strip_leading_blanks *) PROCEDURE strip_trailing_blanks(VAR s : string80) ; BEGIN IF length(s) > 0 THEN IF (s[length(s)] = ' ') OR (s[length(s)] = tab) THEN BEGIN delete(s,length(s),1) ; strip_trailing_blanks(s) ; END ; END ; (* strip_trailing_blanks *) FUNCTION toupper(s : string80) : string80 ; (* returns s converted to upper case *) VAR i : byte ; BEGIN IF length(s) > 0 THEN FOR i := 1 TO length(s) DO s[i] := upcase(s[i]) ; toupper := s ; END ; (* toupper *) FUNCTION toreal(s : string80) : real ; (* converts s to a real number This routine uses the Turbo intrinsic val to do the conversion. If s does not contain a legal representation of a number, it returns 0.0 *) VAR num : real ; code : integer ; BEGIN strip_trailing_blanks(s) ; strip_leading_blanks(s) ; val(s,num,code) ; IF code = 0 THEN toreal := num ELSE toreal := 0 ; END ; (* toreal *) FUNCTION is_number(s : string80) : boolean ; (* checks to see if s contains a legitimate numerical string. It ignores leading and trailing blanks *) VAR num : real ; code : integer ; BEGIN strip_trailing_blanks(s) ; strip_leading_blanks(s) ; IF s <> '' THEN val(s,num,code) ELSE code := -1 ; is_number := (code = 0) ; END ; (* is_number *) FUNCTION head(list : node_ptr) : node_ptr ; (* returns a pointer to the first item in the list. If the list is empty, it returns NIL. *) BEGIN IF list = NIL THEN head := NIL ELSE head := list^.head_ptr ; END ; (* head *) FUNCTION tail(list : node_ptr) : node_ptr ; (* returns a pointer to a list starting at the second item in the list. Note - tail( (a b c) ) points to the list (b c), but tail( ((a b) c d) ) points to the list (c d) . *) BEGIN IF list = NIL THEN tail := NIL ELSE CASE list^.tag OF cons_node : tail := list^.tail_ptr ; free_node : tail := list^.next_free ; ELSE tail := NIL ; END ; END ; (* tail *) FUNCTION element(list : node_ptr ; elem_no : counter) : node_ptr ; (* returns a pointer to the element number elem_no in the list. element(list,1) points to list. element(list,2) is the same as tail(list). *) VAR i : counter ; BEGIN FOR i := 1 TO elem_no - 1 DO list := tail(list) ; element := list ; END ; (* element *) FUNCTION allocation_size(x : counter) : counter ; (* Turbo 3.0 allocates memory in 8 byte blocks, this routine calculates the actual number of bytes returned for a request of x bytes. *) BEGIN allocation_size := (((x - 1) DIV 8) + 1) * 8 ; END ; (* allocation_size *) FUNCTION node_size : counter ; (* calculates the base size of a node. Add the rest of the node to this to get the actual size of a node *) BEGIN node_size := 2 * sizeof(node_ptr) + sizeof(boolean) + sizeof(node_type) ; END ; (* node_size *) FUNCTION normalize(pt : node_ptr) : node_ptr ; (* returns a normalized pointer. Pointers are 32 bit addresses. The first 16 bits contain the segment number and the second 16 bits contain the offset within the segment. Normalized pointers have offsets in the range $0 to $F (0 .. 15) *) VAR pt_seg,pt_ofs : integer ; BEGIN pt_seg := seg(pt^) + (ofs(pt^) DIV 16) ; pt_ofs := ofs(pt^) MOD 16 ; normalize := ptr(pt_seg,pt_ofs) ; END ; (* normalize *) FUNCTION string_val(list : node_ptr) : string80 ; (* returns the string pointed to by list. If list points to a number node, it returns a string representing that number *) VAR s : string[15] ; BEGIN IF list = NIL THEN string_val := '' ELSE IF list^.tag = symbol THEN string_val := list^.string_data ELSE IF list^.tag = number THEN BEGIN str(list^.num_data : 14,s) ; string_val := s ; END ELSE string_val := '' ; END ; (* string_val *) FUNCTION num_val(list : node_ptr) : real ; (* returns the number pointed to by list. If list points to a string, it returns the numerical value of the string. *) VAR s : string80 ; code : integer ; r : real ; BEGIN IF list = NIL THEN num_val := 0.0 ELSE IF list^.tag = number THEN num_val := list^.num_data ELSE IF list^.tag = symbol THEN num_val := toreal(list^.string_data) ELSE num_val := 0.0 ; END ; (* num_val *) FUNCTION attrib_value(p : node_ptr) : string80 ; (* This routine is used by print_rule and print_tree to strip off ':number' from an attribute name. *) BEGIN IF pos(':NUMBER',toupper(string_val(p))) > 0 THEN attrib_value := copy(string_val(p), 1,pos(':NUMBER',toupper(string_val(p))) - 1) ELSE attrib_value := string_val(p) ; END ; (* attrib_value *) FUNCTION tag_value(list : node_ptr) : node_type ; (* returns the value of the tag for a node. *) BEGIN IF list = NIL THEN tag_value := free_node ELSE tag_value := list^.tag ; END ; (* tag_value *) FUNCTION match_lists(list1,list2 : node_ptr) : boolean ; (* returns true if list1 and list2 are identical. Two lists are identical if they are both NIL or if their heads match and match_lists returns true for thier tails. *) BEGIN IF (list1 = NIL) AND (list2 = NIL) THEN match_lists := true ELSE IF (list1 = NIL) OR (list2 = NIL) THEN match_lists := false ELSE IF tag_value(head(list1)) <> tag_value(head(list2)) THEN match_lists := false ELSE CASE tag_value(head(list1)) OF symbol : IF string_val(head(list1)) = string_val(head(list2)) THEN match_lists := match_lists(tail(list1),tail(list2)) ELSE match_lists := false ; number : IF num_val(head(list1)) = num_val(head(list2)) THEN match_lists := match_lists(tail(list1),tail(list2)) ELSE match_lists := false ; cons_node : IF match_lists(head(list1),head(list2)) THEN match_lists := match_lists(tail(list1),tail(list2)) ELSE match_lists := false ; END ; END ; (* match_lists *) FUNCTION on_list(s : string80 ; list : node_ptr) : boolean ; (* checks to see if s is on the list, list. s is on the list if it matches the head of the list or if on_list(tail(list)) returns true. *) BEGIN IF list = NIL THEN on_list := false ELSE IF s = string_val(head(list)) THEN on_list := true ELSE on_list := on_list(s,tail(list)) ; END ; (* on_list *) PROCEDURE print_list(list : node_ptr) ; (* recursively traverses the list and prints its elements. This is not a pretty printer, so the lists may look a bit messy. *) VAR p : node_ptr ; BEGIN IF list <> NIL THEN CASE list^.tag OF symbol : write(string_val(list),' ') ; number : write(num_val(list) : 6,' ') ; cons_node : BEGIN write('(') ; p := list ; WHILE p <> NIL DO BEGIN print_list(head(p)) ; p := tail(p) ; END ; write(') ') ; END ; END ; END ; (* print_list *) PROCEDURE get_memory(VAR p : node_ptr ; size : counter) ; (* On exit p contains a pointer to a block of allocation_size(size) bytes. If possible this routine tries to get memory from the free list before requesting it from the heap *) VAR blks : counter ; allocated : boolean ; PROCEDURE get_from_free(VAR list : node_ptr) ; (* Try and get need memory from the free list. This routine uses a first-fit algorithm to get the space. It takes the first free block it finds with enough storage. If the free block has more storage than was requested, the block is shrunk by the requested amount. *) BEGIN IF list <> NIL THEN IF list^.block_cnt >= (blks - 1) THEN BEGIN p := normalize(ptr(seg(list^),ofs(list^) + (list^.block_cnt - blks + 1) * 8)) ; IF list^.block_cnt = blks - 1 THEN list := list^.next_free ELSE list^.block_cnt := list^.block_cnt - blks ; allocated := true ; total_free := total_free - (blks * 8.0) ; END ELSE get_from_free(list^.next_free) ; END ; (* get_from_free *) BEGIN blks := ((size - 1) DIV 8) + 1 ; allocated := false ; get_from_free(free) ; IF NOT allocated THEN getmem(p,blks * 8) ; END ; (* get_memory *) FUNCTION alloc_str(s : string80) : node_ptr ; (* Allocate storage for a string and return a pointer to the new node. This routine only allocates enough storage for the actual number of characters in the string plus one for the length. Because of this, concatenating anything to the end of a string stored in a symbol node will lead to disaster. Copy the string to a new string do the concatenation and then allocate a new node. *) VAR pt : node_ptr ; BEGIN get_memory(pt,allocation_size(sizeof(node_type) + sizeof(boolean) + length(s) + 1)) ; pt^.tag := symbol ; pt^.string_data := s ; alloc_str := pt ; END ; (* alloc_str *) FUNCTION alloc_num(r : real) : node_ptr ; (* Allocate storage for a real number and return a pointer to the new node. *) VAR pt : node_ptr ; BEGIN get_memory(pt,allocation_size(sizeof(node_type) + sizeof(boolean) + sizeof(real))) ; pt^.tag := number ; pt^.num_data := r ; alloc_num := pt ; END ; (* alloc_num *) FUNCTION cons(new_node,list : node_ptr) : node_ptr ; (* Construct a list. This routine allocates storage for a new cons node. new_node points to the new head of the list. The tail pointer of the new node points to list. This routine adds the new cons node to the beginning of the list and returns a pointer to it. The list described in the comments at the beginning of the program could be constructed as cons(alloc_str('A'),cons(alloc_str('B'),cons(alloc_str('C'),NIL))). *) VAR p : node_ptr ; BEGIN get_memory(p,allocation_size(node_size)) ; p^.tag := cons_node ; p^.head_ptr := new_node ; p^.tail_ptr := list ; cons := p ; END ; (* cons *) FUNCTION append_list(list1,list2 : node_ptr) : node_ptr ; (* Append list2 to list1. This routine returns a pointer to the combined list. Appending is done by consing each item on the first list to the second list. This routine is one of the major sources of garbage so if garbage collection becomes a problem, you may want to rewrite it. *) BEGIN IF list1 = NIL THEN append_list := list2 ELSE append_list := cons(head(list1),append_list(tail(list1),list2)) ; END ; (* append_list *) FUNCTION list_length(list : node_ptr) : counter ; (* returns the length of a list. Note - both (A B C) and ( (A B) C D) have length 3. *) BEGIN IF list = NIL THEN list_length := 0 ELSE list_length := 1 + list_length(list^.tail_ptr) ; END ; (* list_length *) FUNCTION copy_list(list : node_ptr) : node_ptr ; (* Returns a pointer to a copy of list. This routine allocates new nodes for each item in the original list *) BEGIN IF list = NIL THEN copy_list := NIL ELSE CASE tag_value(list) OF cons_node : copy_list := cons(copy_list(head(list)),copy_list(tail(list))) ; number : copy_list := alloc_num(num_val(list)) ; symbol : copy_list := alloc_str(string_val(list)) ; END ; END ; (* copy_list *) PROCEDURE collect_garbage ; (* This routine is specific to Turbo Pascal Ver 3.01 It depends upon the fact that Turbo allocates memory in 8 byte blocks on the PC. If you recompile this program on another system be very careful with this routine. Garbage collection proceeds in three phases: unmark - free all memory between the initial_heap^ and the current top of the heap. mark - mark everything on the saved_list as being in ues. release - gather all unmarked blocks and put them on the free list. The collector displays a '*' on the screen to let you know it is operating. *) FUNCTION lower(p1,p2 : node_ptr) : boolean ; (* returns true if p1 points to a lower memory address than p2 *) BEGIN p1 := normalize(p1) ; p2 := normalize(p2) ; lower := (seg(p1^) < seg(p2^)) OR ((seg(p1^) = seg(p2^)) AND (ofs(p1^) < ofs(p2^))) ; END ; (* lower *) PROCEDURE mark(list : node_ptr) ; (* Mark the blocks on list as being in use. Since a node may be on several lists at one time, if it is already marked we don't continue processing the tail of the list. *) BEGIN IF list <> NIL THEN BEGIN IF NOT list^.in_use THEN BEGIN list^.in_use := true ; IF list^.tag = cons_node THEN BEGIN mark(head(list)) ; mark(tail(list)) ; END ; END ; END ; END ; (* mark *) PROCEDURE unmark_mem ; (* Go through memory from initial_heap^ to HeapPtr^ and mark each node as not in use. The tricky part here is updating the pointer p to point to the next cell. *) VAR p : node_ptr ; string_base,node_allocation : counter ; BEGIN string_base := sizeof(node_type) + sizeof(boolean) ; p := normalize(initial_heap) ; node_allocation := allocation_size(node_size) ; WHILE lower(p,HeapPtr) DO BEGIN p^.in_use := false ; CASE p^.tag OF cons_node : p := normalize(ptr(seg(p^),ofs(p^) + node_allocation)) ; free_node : p := normalize(ptr(seg(p^),ofs(p^) + (p^.block_cnt + 1) * 8)) ; number : p := normalize(ptr(seg(p^), ofs(p^) + allocation_size(string_base + sizeof(real)))) ; symbol : p := normalize(ptr(seg(p^), ofs(p^) + allocation_size(string_base + length(p^.string_data) + 1))) ; END ; END ; END ; (* unmark_mem *) PROCEDURE release_mem ; (* This procedure does the actual collection and compaction of nodes. This is the slow phase of garbage collection because of all the pointer manipulation. *) VAR heap_top : node_ptr ; string_base,node_allocation,string_allocation,block_allocation : counter ; PROCEDURE free_memory(pt : node_ptr ; size : counter) ; (* return size bytes pointed to by pt to the free list. If pt points to a block next to the head of the free list combine it with the top free node. total_free keeps track of the total number of free bytes. *) VAR blks : counter ; BEGIN blks := ((size - 1) DIV 8) + 1 ; pt^.tag := free_node ; IF normalize(ptr(seg(pt^),ofs(pt^) + 8 * blks)) = free THEN BEGIN pt^.next_free := free^.next_free ; pt^.block_cnt := free^.block_cnt + blks ; free := pt ; END ELSE IF normalize(ptr(seg(free^),ofs(free^) + 8 * (free^.block_cnt + 1))) = normalize(pt) THEN free^.block_cnt := free^.block_cnt + blks ELSE BEGIN pt^.next_free := free ; pt^.block_cnt := blks - 1 ; free := pt ; END ; total_free := total_free + (blks * 8.0) ; END ; (* free_memory *) PROCEDURE do_release ; (* This routine sweeps through memory and checks for nodes with in_use = false. *) VAR p : node_ptr ; BEGIN p := normalize(initial_heap) ; WHILE lower(p,heap_top) DO CASE p^.tag OF cons_node : BEGIN IF NOT p^.in_use THEN free_memory(p,node_size) ; p := normalize(ptr(seg(p^),ofs(p^) + node_allocation)) ; END ; free_node : BEGIN block_allocation := (p^.block_cnt + 1) * 8 ; free_memory(p,block_allocation) ; p := normalize(ptr(seg(p^),ofs(p^) + block_allocation)) ; END ; number : BEGIN block_allocation := allocation_size(string_base + sizeof(real)) ; IF NOT p^.in_use THEN free_memory(p,block_allocation) ; p := normalize(ptr(seg(p^),ofs(p^) + block_allocation)) ; END ; symbol : BEGIN string_allocation := allocation_size(string_base + length(p^.string_data) + 1) ; IF NOT p^.in_use THEN free_memory(p,string_base + length(p^.string_data) + 1) ; p := normalize(ptr(seg(p^),ofs(p^) + string_allocation)) ; END ; END ; END ; (* do_release *) BEGIN free := NIL ; total_free := 0.0 ; heap_top := HeapPtr ; string_base := sizeof(node_type) + sizeof(boolean) ; node_allocation := allocation_size(node_size) ; do_release ; END ; (* release_mem *) BEGIN write('*') ; unmark_mem ; mark(saved_list) ; release_mem ; write(back_space) ; clreol ; END ; (* collect_garbage *) PROCEDURE test_memory ; (* This routine activates the garbage collector, if the the total available memory (free_list + heap) is less than a specified amount. Lowering the minimum causes garbage collection to be called less often, but if you make it too small you may not have enough room left for recursion or any temporary lists you need. Using 10000 is probably being overly cautious. *) BEGIN IF (memavail * 16.0) + total_free < 10000 THEN collect_garbage ; END ; (* test_memory *) PROCEDURE wait ; (* Just like it says. It waits for the user to press a key before continuing. *) VAR ch : char ; BEGIN writeln ; writeln ; write('Press any key to continue. ') ; read(kbd,ch) ; write(return) ; clreol ; END ; (* wait *) (* ------------------------------------------------------------------------ End of utility routines ------------------------------------------------------------------------ *) ================================================== CONFER.TXT ================================================== @trans broad and flat @ the shape of the leaves is /not/ broad and flat @ @prompt broad and flat @ Is the shape of the leaves broad and flat ? @ @trans class @ the botanical class the tree belongs to @ @trans leaf shape @ the leaf shape @ @prompt leaf shape @ Is the leaf shape needlelike or scalelike ? @ @trans family @ the botanical family the tree belongs to @ @trans even pattern @ the needles do /not/ line up along two sides of the branch @ @prompt even pattern @ Do the needles grow in two lines along the sides of the branch ? @ @trans silvery line @ there is /not/ a silvery line underneath the needles @ @prompt silvery line @ Is there a silvery line underneath the needles ? @ @trans decurrent @ the stem of the needle does /not/ grow down along the twig @ @prompt decurrent @ Do the stems of the needles grow down along the twig ? @ @trans spray shape @ the shape of the leaf spray @ @prompt spray shape @ Is the shape of the leaf spray round or flat ? @ @trans random needles @ there are /not/ a few needles scattered along the branch @ @prompt random needles @ Are there at least a few needles scattered along the branch ? @ @trans bundle @ the needles are /not/ grouped together in bundles of 2 to 5 @ @prompt bundle @ Are the needles grouped together in bundles of 2 to 5 ? @ @trans needle scar @ the scar left when a needle is removed @ @prompt needle scar @ Pull a needle off the twig. Is the scar it makes raised or depressed ? @ @trans cross section @ the cross section of the needle @ @prompt cross section @ Pull off a needle. Is its cross section flat, triangular or 4-sided ? @ @trans genus @ the botanical genus of the tree @ 1 If broad and flat is yes then class is angiosperm . 2 If broad and flat is no then class is gymnosperm . 3 If class is gymnosperm and leaf shape is scalelike then family is cypress . 4 If class is gymnosperm and leaf shape is needlelike and even pattern is no then family is pine . 5 If class is gymnosperm and leaf shape is needlelike and even pattern is yes and silvery line is yes then family is pine . 6 If family is cypress and spray shape is round and random needles is yes then genus is juniper . 7 If class is gymnosperm and leaf shape is needlelike and even pattern is yes and silvery line is no and decurrent is no then family is bald cypress and genus is bald cypress . 8 If class is gymnosperm and leaf shape is needlelike and even pattern is yes and silvery line is no and decurrent is yes then family is yew and genus is yew . 9 If family is cypress and spray shape is round and random needles is no then genus is white cedar . 10 If family is cypress and spray shape is flat then genus is arbor vitae (thuja) . 11 If family is pine and bundle is yes then genus is pine . 12 If family is pine and bundle is no and silvery line is yes and needle scar is depressed then genus is fir . 13 If family is pine and bundle is no and silvery line is yes and needle scar is raised then genus is hemlock . 14 If family is pine and bundle is no and silvery line is no and cross section is triangular then genus is larch . 15 If family is pine and bundle is no and silvery line is no and cross section is four sided then genus is spruce . 16 If family is pine and bundle is no and silvery line is no and cross section is flat then genus is douglas fir . ================================================== CONIFERS.TXT ================================================== @trans broad and flat @ the shape of the leaves is /not/ broad and flat @ @prompt broad and flat @ Is the shape of the leaves broad and flat ? @ @trans class @ the botanical class the tree belongs to @ @trans leaf shape @ the leaf shape @ @prompt leaf shape @ Is the leaf shape needlelike or scalelike ? @ @trans family @ the botanical family the tree belongs to @ @trans even pattern @ the needles do /not/ line up along two sides of the branch @ @prompt even pattern @ Do the needles grow in two lines along the sides of the branch ? @ @trans silvery line @ there is /not/ a silvery line underneath the needles @ @prompt silvery line @ Is there a silvery line underneath the needles ? @ @trans decurrent @ the stem of the needle does /not/ grow down along the twig @ @prompt decurrent @ Do the stems of the needles grow down along the twig ? @ @trans spray shape @ the shape of the leaf spray @ @prompt spray shape @ Is the shape of the leaf spray round or flat ? @ @trans random needles @ there are /not/ a few needles scattered along the branch @ @prompt random needles @ Arå there at least a few needles scattered along the branch ? @ @trans bundle @ the needles are /not/ grouped together in bundles of 2 to 5 @ @prompt bundle @ Are the needles grouped together in bundles of 2 to 5 ? @ @trans needle scar @ the scar left when a needle is removed @ @prompt needle scar @ Pull off a needle. Is the scar that is left raised or depressed ? @ @trans cross section @ the cross section of the needle @ @prompt cross section @ Pull off a needle. Is its cross section flat, triangular or 4-sided ? @ @trans genus @ the botanical genus of the tree @ 1 If broad and flat is yes then class is angiosperm . 2 If broad and flat is no then class is gymnosperm . 3 If class is gymnosperm and leaf shape is scalelike then family is cypress . 4 If class is gymnosperm and leaf shape is needlelike and even pattern is no then family is pine . 5 If class is gymnosperm and leaf shape is needlelike and even pattern is yes and silvery line is yes then family is pine . 6 If family is cypress and spray shape is round and random needles is yes then genus is juniper . 7 If class is gymnosperm and leaf shape is needlelike and even pattern is yes and silvery line is no and decurrent is no then family is bald cypress and genus is bald cypress . 8 If class is gymnosperm and leaf shape is needlelike and even pattern is yes and silvery line is no and decurrent is yes then family is yew and genus is yew . 9 If family is cypress and spray shape is round and random needles is no then genus is white cedar . 10 If family is cypress and spray shape is flat then genus is arbor vitae (thuja) . 11 If family is pine and bundle is yes then genus is pine . 12 If family is pine and bundle is no and silvery line is yes and needle scar is depressed then genus is fir . 13 If family is pine and bundle is no and silvery line is yes and needle scar is raised then genus is hemlock . 14 If family is pine and bundle is no and silvery line is no and cross section is triangular then genus is larch . 15 If family is pine and bundle is no and silvery line is no and cross section is four sided then genus is spruce . 16 If family is pine and bundle is no and silvery line is no and cross section is flat then genus is douglas fir . ================================================== RULEXREF.PAS ================================================== (* Copyright 1984 MicroExpert Systems *) (*$V- *) PROGRAM rule_xref ; CONST word_size = 30 ; goal_size = 38 ; max_rule = 100 ; TYPE string80 = string[80] ; word = string[word_size] ; string1 = string[1] ; byte = 0 .. 255 ; counter = 0 .. maxint ; item_type = (cond,concld) ; item_ptr = ^item ; string_ptr = ^string_rec ; string_rec = RECORD info : string80 ; next_line : string_ptr ; END ; item = RECORD next : item_ptr ; attr : word ; CASE boolean OF TRUE : ( val : word ; kind : item_type ; rule_no : counter ) ; FALSE : ( prompt_ptr : string_ptr ; trans_ptr : string_ptr ; val_ptr : item_ptr) ; END ; VAR line : string80 ; rule_file : text ; token : word ; etx : string1 ; free,attr_list : item_ptr ; biggest_rule : counter ; PROCEDURE toupper(VAR s : string80) ; VAR i : byte ; BEGIN IF length(s) > 0 THEN FOR i := 1 TO length(s) DO IF s[i] IN ['a' .. 'z'] THEN s[i] := chr(ord(s[i]) - 32) ; END ; (* toupper *) PROCEDURE makestr(VAR s : string80 ; len : byte) ; VAR old_length : byte ; BEGIN old_length := length(s) ; (*$R- *) s[0] := chr(len) ; (*$R+ *) IF old_length < len THEN fillchar(s[old_length+1],len - old_length,' ') ; END ; (* makestr *) FUNCTION tointeger(s : word) : integer ; BEGIN IF length(s) = 0 THEN tointeger := 0 ELSE IF s[1] = '-' THEN BEGIN delete(s,1,1) ; tointeger := - tointeger(s) ; END ELSE IF NOT (s[1] IN ['0' .. '9']) THEN BEGIN delete(s,1,1) ; tointeger := tointeger(s) ; END ELSE IF length(s) = 1 THEN tointeger := ord(s[1]) - ord('0') ELSE tointeger := tointeger(copy(s,length(s),1)) + 10 * tointeger(copy(s,1,length(s)-1)) ; END ; (* tointeger *) PROCEDURE strip_leading_blanks(VAR s : string80) ; BEGIN IF length(s) > 0 THEN IF s[1] = ' ' THEN BEGIN delete(s,1,1) ; strip_leading_blanks(s) ; END ; END ; (* strip_leading_blanks *) PROCEDURE strip_trailing_blanks(VAR s : string80) ; BEGIN IF length(s) > 0 THEN IF s[length(s)] = ' ' THEN BEGIN delete(s,length(s),1) ; strip_trailing_blanks(s) ; END ; END ; (* strip_leading_blanks *) FUNCTION on_list(s : word ; list : item_ptr ; VAR at : item_ptr) : boolean ; FUNCTION find_it(list : item_ptr) : boolean ; BEGIN IF list = NIL THEN find_it := FALSE ELSE IF s = list^.attr THEN BEGIN at := list ; find_it := TRUE ; END ELSE find_it := find_it(list^.next) ; END ; (* on_list *) BEGIN at := NIL ; toupper(s) ; makestr(s,word_size) ; on_list := find_it(list) ; END ; (* on_list *) FUNCTION alloc : item_ptr ; VAR p : item_ptr ; BEGIN IF free = NIL THEN new(p) ELSE BEGIN p := free ; free := free^.next ; END ; alloc := p ; END ; (* alloc *) PROCEDURE dispose_item(p : item_ptr) ; BEGIN p^.next := free ; free := p ; END ; (* dispose_item *) PROCEDURE remove_item(VAR list : item_ptr) ; VAR p : item_ptr ; BEGIN IF list <> NIL THEN BEGIN p := list ; list := list^.next ; dispose_item(p) ; END ; END ; (* remove_item *) PROCEDURE remove_list(VAR list : item_ptr) ; BEGIN IF list <> NIL THEN BEGIN remove_item(list) ; remove_list(list) ; END ; END ; (* remove_list *) PROCEDURE new_item(s1,s2 : string80 ; typ : item_type ; rule_num : counter ; VAR list : item_ptr) ; VAR p : item_ptr ; BEGIN makestr(s1,word_size) ; toupper(s1) ; makestr(s2,word_size) ; toupper(s2) ; p := alloc ; WITH p^ DO BEGIN attr := s1 ; val := s2 ; kind := typ ; rule_no := rule_num ; END ; p^.next := list ; list := p ; END ; (* new_item *) PROCEDURE put_on_end(s1,s2 : string80 ; typ : item_type ; rule_no : counter ; VAR list : item_ptr) ; BEGIN IF list = NIL THEN new_item(s1,s2,typ,rule_no,list) ELSE put_on_end(s1,s2,typ,rule_no,list^.next) ; END ; (* put_on_end *) PROCEDURE read_the_file ; VAR error : boolean ; PROCEDURE scanf ; PROCEDURE get_line ; BEGIN readln(rule_file,line) ; IF eof(rule_file) THEN line := etx ; END ; (* get_line *) PROCEDURE get_token ; VAR i : -1 .. 255 ; BEGIN strip_leading_blanks(line) ; IF length(line) > 0 THEN BEGIN i := pos(' ',line) - 1 ; IF i <= 0 THEN i := length(line) ; token := copy(line,1,i) ; toupper(token) ; delete(line,1,i) ; END ELSE BEGIN get_line ; get_token ; END ; END ; (* get_token *) BEGIN IF eof(rule_file) THEN token := etx ELSE get_token ; END ; (* scanf *) PROCEDURE at_line ; TYPE info_type = (prmpt,trns,nl) ; VAR typ : info_type ; attr_word : string80 ; PROCEDURE read_a_line ; BEGIN readln(rule_file,line) ; write('.') ; END ; (* read_a_line *) PROCEDURE insert_attr(s : word ; typ : info_type ; line : string80 ; VAR list : item_ptr) ; PROCEDURE put_in_list(VAR p : item_ptr) ; PROCEDURE new_attr_item ; VAR s_ptr : string_ptr ; ptr : item_ptr ; BEGIN ptr := alloc ; ptr^.attr := s ; new(s_ptr) ; s_ptr^.info := line ; s_ptr^.next_line := NIL ; CASE typ OF prmpt : BEGIN ptr^.prompt_ptr := s_ptr ; ptr^.trans_ptr := NIL ; END ; trns : BEGIN ptr^.prompt_ptr := NIL ; ptr^.trans_ptr := s_ptr ; END ; END ; ptr^.val_ptr := NIL ; ptr^.next := p ; p := ptr ; END ; (* new_attr_item *) PROCEDURE old_attr_item ; VAR s_ptr : string_ptr ; PROCEDURE end_insert(VAR p_list : string_ptr) ; BEGIN IF p_list = NIL THEN p_list := s_ptr ELSE end_insert(p_list^.next_line) ; END ; (* end_insert *) BEGIN new(s_ptr) ; s_ptr^.info := line ; s_ptr^.next_line := NIL ; CASE typ OF prmpt : end_insert(p^.prompt_ptr) ; trns : end_insert(p^.trans_ptr) ; END ; END ; (* old_attr_item *) BEGIN IF p = NIL THEN new_attr_item ELSE IF s < p^.attr THEN new_attr_item ELSE IF s = p^.attr THEN old_attr_item ELSE put_in_list(p^.next) ; END ; (* put_in_list *) BEGIN makestr(s,word_size) ; put_in_list(list) ; END ; (* insert_attr *) BEGIN attr_word := '' ; IF token = '@PROMPT' THEN typ := prmpt ELSE typ := trns ; scanf ; WHILE token <> '@' DO BEGIN attr_word := concat(attr_word,token,' ') ; scanf ; END ; read_a_line ; WHILE (NOT eof(rule_file)) AND (pos('@',line) = 0) DO BEGIN insert_attr(attr_word,typ,line,attr_list) ; read_a_line ; END ; scanf ; END ; (* at_line *) PROCEDURE rule ; VAR attr,pred,val : string80 ; rule_no : counter ; kind : item_type ; num : string[4] ; at : item_ptr ; PROCEDURE runout ; BEGIN WHILE (token <> '.') AND (pos('@',token) <> 0) AND (token <> etx) DO scanf ; END ; (* runout *) PROCEDURE error_rtn(err_num : byte) ; BEGIN writeln ; write('***** error - rule : ',rule_no : 3,' ***** ') ; CASE err_num OF 1 : writeln('Couldn''t find rule number.') ; 2 : writeln('Missing ''IF''.') ; 3 : writeln('Missing ''THEN''.') ; 4 : writeln('Couldn''t find an attribute.') ; 5 : writeln('Couldn''t find a value.') ; END ; runout ; error := TRUE ; END ; (* error_rtn *) FUNCTION legal_pred(w : word) : boolean ; BEGIN legal_pred := (w = 'IS') ; END ; (* legal_pred *) PROCEDURE attribute ; BEGIN IF NOT legal_pred(token) THEN BEGIN attr := concat(attr,token,' ') ; scanf ; attribute ; END ; END ; (* attribute *) PROCEDURE predicate ; BEGIN IF attr = '' THEN error_rtn(4) ELSE IF legal_pred(token) THEN BEGIN pred := token ; scanf ; END ; END ; (* predicate *) PROCEDURE value ; BEGIN IF pred = '' THEN error_rtn(5) ELSE IF (token <> '.') AND (token <> etx) AND (token <> 'AND') AND (token <> 'THEN') THEN BEGIN val := concat(val,token,' ') ; scanf ; value ; END ; END ; (* value *) PROCEDURE clause ; PROCEDURE put_in_val_list(s1,s2 : string80 ; kind : item_type ; rule_no : counter) ; VAR at : item_ptr ; PROCEDURE put_in_list(VAR list : item_ptr) ; BEGIN IF list = NIL THEN new_item(s1,s2,kind,rule_no,list) ELSE IF list^.val > s2 THEN new_item(s1,s2,kind,rule_no,list) ELSE put_in_list(list^.next) ; END ; (* put_in_list *) PROCEDURE make_new_attr_item ; VAR ptr : item_ptr ; PROCEDURE put_in_attr_list(VAR list :item_ptr) ; BEGIN IF list = NIL THEN list := ptr ELSE IF ptr^.attr < list^.attr THEN BEGIN ptr^.next := list ; list := ptr ; END ELSE put_in_attr_list(list^.next) ; END ; (* put_in_attr_list *) BEGIN ptr := alloc ; makestr(s1,word_size) ; toupper(s1) ; WITH ptr^ DO BEGIN attr := s1 ; next := NIL ; prompt_ptr := NIL ; trans_ptr := NIL ; val_ptr := NIL ; END ; put_in_attr_list(attr_list) ; makestr(s2,word_size) ; toupper(s2) ; put_in_list(ptr^.val_ptr) ; END ; (* make_new_attr_item*) BEGIN IF on_list(s1,attr_list,at) THEN BEGIN makestr(s2,word_size) ; toupper(s2) ; put_in_list(at^.val_ptr) ; END ELSE make_new_attr_item ; END ; (* put_in_val_list *) BEGIN attr := '' ; pred := '' ; val := '' ; attribute ; predicate ; IF NOT error THEN value ; IF NOT error THEN put_in_val_list(attr,val,kind,rule_no) ; END ; (* clause *) PROCEDURE condition ; BEGIN IF NOT error THEN BEGIN kind := cond ; clause ; IF token = 'AND' THEN BEGIN scanf ; condition ; END ; END ; END ; (* condition *) PROCEDURE conclusion ; BEGIN IF NOT error THEN BEGIN kind := concld ; clause ; IF token = 'AND' THEN BEGIN scanf ; conclusion ; END ; END ; END ; (* conclusion *) BEGIN rule_no := tointeger(token) ; IF rule_no > 0 THEN BEGIN scanf ; IF token = 'IF' THEN BEGIN scanf ; condition ; IF (token = 'THEN') AND (NOT error) THEN BEGIN scanf ; conclusion ; IF (rule_no > biggest_rule) AND (NOT error) THEN biggest_rule := rule_no ; END ELSE error_rtn(3) END ELSE error_rtn(2) ; END ELSE error_rtn(1) ; END ; (* rule *) BEGIN error := FALSE ; scanf ; IF token <> etx THEN BEGIN IF (token = '@PROMPT') OR (token ='@TRANS') THEN at_line ELSE rule ; write('.') ; read_the_file ; END ; END ; (* read_the_file *) FUNCTION got_rule_files : boolean ; VAR ch : char ; rule_name : string80 ; FUNCTION file_ok : boolean ; FUNCTION open(VAR file_id : text ; file_name : string80) : boolean ; BEGIN (*$I- *) (* For Apple Pascal reset(file_id,file_name) ; *) assign(file_id,file_name) ; reset(file_id) ; open := (ioresult = 0) ; (*$I+ *) END ; (* open *) BEGIN write('File name : ') ; readln(rule_name) ; toupper(rule_name) ; IF pos('.TXT',rule_name) = 0 THEN rule_name := concat(rule_name,'.TXT') ; file_ok := open(rule_file,rule_name) ; END ; (* file_ok *) BEGIN IF NOT file_ok THEN BEGIN writeln ; writeln('An error has occurred while opening the files.') ; writeln ; write('Press to quit, any other key to continue.') ; read(trm,ch) ; writeln ; IF ch <> chr(27) THEN got_rule_files := got_rule_files ELSE got_rule_files := FALSE ; END ELSE got_rule_files := TRUE ; END ; (* got_rule_files *) PROCEDURE initialize ; BEGIN etx := ' ' ; etx[1] := chr(3) ; line := '' ; biggest_rule := 0 ; free := NIL ; attr_list := NIL ; END ; (* initialize *) PROCEDURE xref ; VAR out_name : string80 ; ch : char ; out_file : text ; PROCEDURE print_xref(list : item_ptr) ; PROCEDURE print_str(msg : word ; ptr : string_ptr) ; PROCEDURE print_s_list(p : string_ptr) ; BEGIN IF p <> NIL THEN BEGIN writeln(out_file,p^.info) ; print_s_list(p^.next_line) ; END ; END ; (* print_s_list *) BEGIN IF ptr <> NIL THEN BEGIN writeln(out_file,msg) ; print_s_list(ptr) ; writeln(out_file) ; END ; END ; (* print_str *) PROCEDURE print_v_list ; VAR last_val : word ; PROCEDURE print_v(ptr : item_ptr) ; BEGIN IF ptr <> NIL THEN BEGIN IF ptr^.val <> last_val THEN BEGIN writeln(out_file) ; write(out_file,ptr^.val) ; last_val := ptr^.val ; END ; write(out_file,ptr^.rule_no : 4) ; print_v(ptr^.next) ; END ; END ; (* print_v *) BEGIN IF list^.val_ptr <> NIL THEN BEGIN writeln(out_file,'Value',' ' : word_size - 5,'Rule(s)') ; last_val := '' ; print_v(list^.val_ptr) ; writeln(out_file) ; END ; END ; (* print_v_list *) BEGIN IF list <> NIL THEN BEGIN writeln(out_file) ; writeln(out_file,'Attribute : ',list^.attr) ; writeln(out_file) ; print_str('Prompt : ',list^.prompt_ptr) ; print_str('Translation : ',list^.trans_ptr) ; print_v_list ; IF out_name = 'CON:' THEN BEGIN writeln ; write('Press any key to continue. ') ; read(trm,ch) ; writeln ; END ; print_xref(list^.next) ; END ; END ; (* print_xref *) BEGIN writeln ; write('Output File ( for con:) ') ; readln(out_name) ; toupper(out_name) ; IF out_name = '' THEN out_name := 'CON:' ; assign(out_file,out_name) ; rewrite(out_file) ; (* For Apple Pascal rewrite(out_file,out_name) ; *) print_xref(attr_list) ; close(out_file) ; (* For Apple Pascal close(out_file,lock) ; *) END ; (* xref *) BEGIN initialize ; IF got_rule_files THEN BEGIN read_the_file ; close(rule_file) ; xref ; END ; END. ================================================== SAMPLE.TXT ================================================== @trans stem @ The stem of the plant @ @prompt stem @ Is the stem of the plant woody or green ? @ @trans position @ The position of the plant @ @prompt position @ Is the position of the plant upright or creeping ? @ @trans one main trunk @ There is /not/ one main trunk @ @prompt one main trunk @ Does the plant have one main trunk ? @ @trans type of plant @ the type of the plant @ @trans broad and flat @ The shape of the leaves is /not/ broad and flat @ @prompt broad and flat @ Is the shape of the leaves broad and flat ? @ @trans class @ The class of the tree @ @trans leaf shape @ The leaf shape @ @prompt leaf shape @ Is the leaf shape needlelike or scalelike ? @ @trans needle pattern @ The pattern the needles form along the branch @ @prompt needle pattern @ Is the pattern that the needles form along the branch a random one or are the needles is 2 even lines ? @ @trans silver bands @ There is /not/ a silver band under each needle @ @prompt silver bands @ Is there a silver band under each needle ? @ @trans family @ The family of the plant @ 1 If class is gymnosperm and leaf shape is scalelike then family is cypress . 2 If class is gymnosperm and leaf shape is needlelike and needle pattern is random then family is pine . 3 If class is gymnosperm and leaf shape is needlelike and needle pattern is 2 even lines and silver bands is yes then family is pine . 4 If class is gymnosperm and leaf shape is needlelike and needle pattern is 2 even lines and silver bands is no then family is pine . 5 If type of plant is tree and broad and flat is yes then class is angiosperm . 6 If type of plant is tree and broad and flat is no then class is gymnosperm . 7 If stem is green then type of plant is herb . 8 If stem is woody and position is creeping then type of plant is vine . 9 If stem is woody and position is upright and one main trunk is yes then type of plant is tree . 10 If stem is woody and position is upright and one main trunk is no then type of plant is shrub .