1 Randy Hyde's Standard Library for 8086 Assembly Language Programmers This software is ... sssssss ss ss ss sssssss sssssss ss ss ss ssss ss ss ss ss ss ss ss ss ss ss ss sssssss sssssssss ssssssss sssssss sssss ssssssss ss ss ss ss ss ss ss ss ss ss ss ss ss ss ss ss sssssss ss ss ss ss ss ss sssssss ww ww ww sssssss sssssss ww ww wwww ss ss ss ww ww ww ww ww ss ss ss ww wwww ww wwwwwwww sssssss sssss ww ww ww ww ww ww ss ss ss wwww wwww ww ww ss ss ss ww ww ww ww ss ss sssssss 'cuz I'm sharing it with you! I do not want any registrations or fees for the use of this software. I thank God and Jesus Christ (my personal saviour) for giving me the ability to write such software. God wants all of us to use our talents to glorify him, therefore I offer this software as such. Now for the catch... It is more blessed to give than to receive. If this software saves you time and effort and you enjoy using it, my life will be enriched knowing that others have appreciated my work. I would like to share this wonderful feeling with you. If you like this software and use it, I would like you to contribute at least one routine to the library. Perhaps you think this library has some neet-o routines in it. Imagine how nice it would become if everyone used their imagination to contribute something useful to it. I hereby release this software to the public domain. You can use it in any way you see fit. However, I would appreciate it if you share this software with other much as I've shared it with you. I'm not suggesting that you give away software you've written with this package (I'm not quite as crazy as Richard Stallman, bless his heart), but if someone else would like a copy of this library, please help them out. Naturally, I'd be tickeled pink to receive credit in software that uses these routines (which is the honorable thing to do) but I understand the way many corporations operate and won't be terrible put off if you use it without giving due credit. Enjoy! If you have comments, bug reports, new code to contribute, etc., you can reach me at: rhyde (On BIX). rhyde@cs.ucr.edu (On Internet). rhyde@ucrmath.ucr.edu (On Internet, this one may go away). or Randy Hyde Dept of Computer Science 2208 Sproul Hall University of California Riverside, Ca. 92521-0135 or Randy Hyde c/o Braintec Corporation 10 Corporate Park Way, ste 110 Irvine, Ca. 92714 1.1 Comments about the code This code has received very little testing. C'mon, whadda expect for free? I've been cranking this stuff out as fast as possible without going back and reworking anything I've done. The only exception has been modification of the routines to use the es:di/dx:si register pairs rather than es:si/ds:di register pairs. I expect those modifications introduced more bugs. Please don't expect super optimal code here. I have had anytime to study and improve this code. Most of it is fairly mediocre (from a size/speed point of view). Hopefully, you'll agree, it's the idea that counts. If you don't like something I've done, you've got the sources -- have at it. (Of course, I'd appreciate it if you would send me any modifications.) 1.2 Wish List Next, I'll be working on FILE I/O versions of the I/O routines in this package. Sooner or later I'll get around to adding floating point routines to this package. If you're interested in adding some routines to this package, GREAT! Routines I'd like to have but am too busy to work on now: 1) Routines which manipulate directories (read/write/etc.) 2) A regular expression interpreter. 3) Length-prefixed strings package. 4) A windowing package. 5) A graphics package. 6) An object-oriented programming class library. 7) Just about anything else appearing in a HLL "standard" library. If you've got any ideas, I'd love to discuss them with you. Best bet is to reach me electronically at the E-MAIL addresses above. 1.3 Missing Routines to Supply RSN String package: strins Inserts one string into the middle of another strdel Deletes a sequence of characters from the middle of a string. Character Set Package: span- Skips through a sequence of characters in a string which belong to a character set. break- Skips through a sequence of characters in a string which do not belong to a character set. Memory Manager Package Memavail- Largest block of free memory available on the heap. Memfree- Total amount of free space on the heap. 2 Character Output Routines 2.1 Putc * Outputs character in AL register to the standard output device. * Output is redirectable to user-written routine. Inputs: AL- character to print. Outputs: None. Include: stdlib.a Putc is the primitive character output routine. Most other output routines in the standard library output data through this procedure. Prints the ASCII character in AL. Processing of control codes is undefined although most output routines this guy links to should be able to handle return, line feed, back space, and tab. By default, this routine calls DOS to print the character to the standard output device. Example: mov al, 'C' putc ;Prints "C" to std output. 2.2 PutCR * Easy way of printing a newline to the stdlib standard output. Inputs: None. Outputs: None. Include: stdlib.a Prints a newline (carriage return/line feed) to the current standard output device. Example: PutCR 2.3 PutcStdOut * Outputs character in AL to the DOS standard output device. * Sends a character directly to the DOS std output device. * Output is redirectable via DOS I/O redirection. * Bypasses redirection through the standard library Putc routine. Inputs: AL- character to output. Outputs: None. Include: stdlib.a PutcStdOut calls DOS to print the character in AL to the standard output device. Although processing of non-ASCII characters and control characters is undefined, most output devices handle these characters properly. In particular, most output devices properly handle return, line feed, back space, and tab. Example: mov al, 'C' PutcStdOut ;Writes "C" to std output. 2.4 PutcBIOS * Prints character in AL to the display device by calling BIOS. * Cannot be redirected by stdlib or by DOS. * Uses INT 10H/AH=14 for teletype-like output. * Handles return, line feed, back space, and tab. Prints other control characters using the IBM Character set. Inputs: AL- Character to print. Outputs: None. Include- stdlib.a PutcBIOS prints the character in AL using the BIOS routines. Output through this routine cannot be redirected, such output is always sent to the video display on the PC (unless, of course, someone has patched INT 10h). Example: mov al, "C" PutcBIOS 2.5 GetOutAdrs * Retrieves address of the current output routine. Inputs: None. Outputs: es:di - address of current output routine (called by Putc). Include: stdlib.a You can use this function to get the address of the current output routine, perhaps so you can save it or see if it is currently pointing at some particular piece of code. If you want to temporarily redirect the output and then restore the original output routine, consider using PushOutAdrs/PopOutAdrs described later. Example: GetOutAdrs mov word ptr SaveOutAdrs, di mov word ptr SaveOutAdrs+2, es 2.6 SetOutAdrs * Lets you set the address of the current output routine. Inputs: es:di- Address of new output routine. Outputs: None. Include: stdlib.a This routine redirects the stdlib standard output so that it calls the routine whose address you pass in es:di. This routine should expect the character in AL and must preserve all registers. At a bare minimum, it should handle the printable ASCII characters and the four control characters return, line feed, back space, and tab (unless, of course, the main purpose of this routine is to handle these codes in a different fashion). Example: mov es, seg NewOutputRoutine mov di, offset NewOutputRoutine SetOutAdrs . . . les di, RoutinePtr SetOutAdrs 2.7 PushOutAdrs * Lets you redirect the standard output device and preserve the previous address. * Saves up to 16 old output routine addresses on an internal stack. * Restoration is possible using PopOutAdrs. Inputs: es:di- Address of new output routine. Outputs: Carry=0 if operation successful. Carry=1 if there were already 16 items on the stack. Include: stdlib.a This routine "pushes" the current output address onto an internal stack and then stores the value in es:di into the current output routine pointer. The PushOutAdrs and PopOutAdrs routines let you easily save and redirect the standard output and then restore the original output routine address later on. If you attempt to push more than 16 items on the stack, PushOutAdrs will ignore your request and return with the carry flag set. If PushOutAdrs is successful, it will return with the carry flag clear. Example: mov es, seg NewOutputRoutine mov di, offset NewOutputRoutine PushOutAdrs . . . les di, RoutinePtr PushOutAdrs 2.8 PopOutAdrs * Restores output routine addresses saved by PushOutAdrs. * Defaults to PutcStdOut if you attempt to pop too many items off the stack. Inputs: None. Outputs: es:di- Points at the previous stdout routine before the pop. Include: stdlib.a PopOutAdrs undoes the effects of PushOutAdrs. It pops an item off the internal stack and stores it into the output routine pointer. The previous value in the output pointer is returned in es:di. Example: mov es, seg NewOutputRoutine mov di, offset NewOutputRoutine PushOutAdrs . . . PopOutAdrs 2.9 Puts * Outputs a string of characters to the stdlib standard output device. * Calls putc for each character in the string thereby sending each character out to the standard output device. Inputs: es:di- Contains the address of the string to print. Outputs: None. Include: stdlib.a Puts prints a zero-terminated string whose address appears in es:di. Each character appearing in the string is printed verbatim. There are no special escape characters. Unlike the "C" routine by the same name, puts does not print a newline after printing the string. Use putcr if you want to print the newline after printing a string with puts. Example: les di, StrToPrt puts putcr 2.10 Puth * Outputs the byte in AL as two hex digits (including leading zero if necessary). * Calls stdlib putc routine to print both characters to the stdlib standard output device. Inputs: AL- Value to print. Outputs: None. Include: Stdlib.a Prints the value in the AL register as two hexadecimal digits. If the value in AL is between 0 and 0Fh, puth will print a leading zero. This routine calls the stdlib standard output routine (putc) to print all characters. Example: mov al, 1fh puth 2.11 Putw * Outputs the word in AX as four hex digits (including leading zeros if necessary). * Calls stdlib putc routine to print characters to the stdlib standard output device. Inputs: AX- Value to print. Outputs: None. Include: Stdlib.a Prints the value in the AX register as four hexadecimal digits. If the value in AX is between 0 and 0Fh, puth will print a leading zero. This routine calls the stdlib standard output routine (putc) to print all characters. Example: mov ax, 0f1fh putw 2.12 Puti * Outputs the word in AX as a signed decimal number (including minus sign, if necessary). * Calls stdlib putc routine to print characters to the stdlib standard output device. Inputs: AX- Value to print. Outputs: None. Include: Stdlib.a Prints the value in the AX register as a decimal integer. This routine uses the exact number of screen positions required to print the number (including a position for the minus sign, if the number is negative). This routine calls the stdlib standard output routine (putc) to print all characters. Example: mov ax, -1234 puti 2.13 Putu * Outputs the word in AX as an unsigned decimal number. * Calls stdlib putc routine to print both characters to the stdlib standard output device. Inputs: AX- Value to print. Outputs: None. Include: Stdlib.a Prints the value in the AX register as a decimal integer. This routine uses the exact number of screen positions required to print the number. This routine calls the stdlib standard output routine (putc) to print all characters. Example: mov ax, 1234 putu 2.14 Putl * Outputs the double word in DX:AX as a signed decimal number (including minus sign, if necessary). * Calls stdlib putc routine to print characters to the stdlib standard output device. Inputs: DX:AX- Value to print. Outputs: None. Include: Stdlib.a Prints the value in the DX:AX registers as a decimal integer. This routine uses the exact number of screen positions required to print the number (including a position for the minus sign, if the number is negative). This routine calls the stdlib standard output routine (putc) to print all characters. Example: mov dx, 0ffffh mov ax, -1234 putl 2.15 Putul * Outputs the double word in DX:AX as an unsigned decimal number (including minus sign, if necessary). * Calls stdlib putc routine to print characters to the stdlib standard output device. Inputs: DX:AX- Value to print. Outputs: None. Include: Stdlib.a Prints the value in the DX:AX registers as a decimal integer. This routine uses the exact number of screen positions required to print the number. This routine calls the stdlib standard output routine (putc) to print all characters. Example: mov dx, 12h mov ax, 1234 putul 2.16 PutISize * Prints the value in AX as a signed decimal integer. * Prints the number in a minimum field width specified by the value in CX. Inputs: AX- Value to print. CX- Minimum number of print positions to use. Outputs: None. Include: Stdlib.a PutISize prints the signed integer value in AX to the stdlib standard output device using a minimum of n print positions. CX contains n, the minimum field width for the output value. The number (including any necessary minus sign) is printed right justified in the output field. If the number in AX requires more print positions than specified by CX, PutISize uses however many print positions are necessary to actually print the number. If you specify zero in CX, PutISize uses the minimum number of print positions required. Of course, PutI will also use the minimum number of print positions without disturbing the value in the CX register. Note that, under no circumstances, will the number in AX ever require more than size print positions (-32,767 requires the most print positions). Examples: mov cx, 5 mov ax, I PutISize . . . mov cx, 12 mov ax, J PutISize 2.17 PutUSize * Prints the value in AX as an unsigned decimal integer. * Prints the number in a minimum field width specified by the value in CX. Inputs: AX- Value to print. CX- Minimum number of print positions to use. Outputs: None. Include: Stdlib.a Like PutISize above except this guy prints unsigned values. Note that the maximum number of print positions required by any number (e.g., 65,535) is five. Example: mov cx, 8 mov ax, U PutUSize 2.18 PutLSize * Prints the value in DX:AX as a long signed decimal integer. * Prints the number in a minimum field width specified by the value in CX. Inputs: DX:AX- Value to print. CX- Minimum number of print positions to use. Outputs: None. Include: Stdlib.a Like PutISize above, except this guy prints the long integer value in DX:AX. Note that there may be as many as 11 print positions (e.g., -1,000,000,000). Example: mov cx, 16 mov dx, word ptr L+2 mov ax, word ptr L PutLSize 2.19 PutULSize * Prints the value in DX:AX as a long unsigned decimal integer. * Prints the number in a minimum field width specified by the value in CX. Inputs: DX:AX- Value to print. CX- Minimum number of print positions to use. Outputs: None. Include: Stdlib.a Just like PutLSize above except this guy prints unsigned numbers rather than signed long integers. The largest field width for such a value is 10 print positions. Example: mov cx, 8 mov dx, word ptr UL+2 mov ax, word ptr UL PutULSize 2.20 Print * Prints a string literal. * Very convenient to use. * Calls stdlib putc routine to print characters to the stdlib standard output device. Inputs: CS:RET - Return address points at the string to print. Outputs: None. Include: Stdlib.a Print lets you print string literals in a convenient fashion. The string to print immediately follows the call to the print routine. The string must contain a zero terminating byte and may not contain any intervening zero bytes. Since the print routine returns to the address immediately following the zero terminating byte, forgetting this byte or attempting to print a zero byte in the middle of a literal string will cause print to return to an unexpected instruction. This usually hangs up the machine. Be very careful when using this routine! Example: print db "Print this string to the display device" db 13,10 db "This appears on a new line" db 13,10 db 0 2.21 Printf * Formatted output routine. * Very similar to the "C" function of the same name. * Prints integers (normal, long, unsigned, etc.), characters, strings, and other data types (this routine, however, does not support floating point output). * Calls stdlib putc routine to print characters to the stdlib standard output device. Inputs: CS:RET - Return address points at the format string. Outputs: None. Include: Stdlib.a Printf, like its "C" namesake, provides formatted output capabilities for the stdlib package. A typical call to printf always takes the following form: printf db "format string",0 dd operand1, operand2, ..., operandn The format string is comparable to the one provided in the "C" programming language. For most characters, printf simply prints the characters in the format string up to the terminating zero byte. The two exceptions are character prefixed by a backslash ("\") and character prefixed by a percent sign ("%"). Like C's printf, stdlib's printf uses the backslash as an escape character and the percent sign as a lead-in to a format string. Printf uses the escape character ("\") to print special characters in a fashion similar to, but not identical to C's printf. Stdlib's printf routine supports the following special characters: * r Print a carriage return (but no line feed) * n Print a new line character (carriage return/line feed). * b Print a backspace character. * t Print a tab character. * l Print a line feed character (but no carriage return). * f Print a form feed character. * \ Print the backslash character. * % Print the percent sign character. * 0xhh Print ASCII code hh, represented by two hex digits. C users should note a couple of differences between stdlib's escape sequences and C's. First, use "\%" to print a percent sign within a format string, not "%%". C doesn't allow the use of "\%" because the C compiler processes "\%" at compile time (leaving a single "%" in the object code) whereas printf processes the format string at run-time. It would see a single "%" and treat it as a format lead-in character. Stdlib's printf, on the other hand, processes both the "\" and "%" and run-time, therefore it can distinguish "\%". Strings of the form "\0xhh" must contain exactly two hex digits. The current printf routine isn't robust enough to handle sequences of the form "\0xh" which contain only a single hex digit. Keep this in mind if you find printf chopping off characters after you print a value. There is absolutely no reason to use any escape character sequences except "\0x00". Printf grabs all characters following the call to printf up to the terminating zero byte (which is why you'd need to use "\0x00" if you want to print the null character, printf will not print such values). Stdlib's printf routine doesn't care how those characters got there. In particular, you are not limited to using a single string after the printf call. The following is perfectly legal: printf db "This is a string",13,10 db "This is on a new line",13,10 db "Print a backspace at the end of this line:" db 8,13,10,0 You code will run a tiny amount faster if you avoid the use of the escape character sequences. More importantly, the escape character sequences take at least two bytes. You can encode most of them as a single byte by simply embedding the ASCII code for that byte directly into the code stream. Don't forget, you cannot embed a zero byte into the code stream. A zero byte terminates the format string. Instead, use the "\0x00" escape sequence. Format sequences always between with "%". For each format sequence you must provide a far pointer to the associated data immediately following the format string, e.g., printf db "%i %i",0 dd i,j Format sequences take the general form "%s\cn^f" where: * "%" is always the "%" character. Use "\%" if you actually want to print a percent sign. * s is either nothing or a minus sign ("-"). * "\c" is also optional, it may or may not appear in the format item. "c" represents any printable character. * "n" represents a string of 1 or more decimal digits. * "^" is just the caret (up-arrow) character. * "f" represents one of the format characters: i, d, x, h, u, c, s, ld, li, lx, or lu. The "s", "\c", "n", and "^" items are optional, the "%" and "f" items must be present. Furthermore, the order of these items in the format item is very important. The "\c" entry, for example, cannot precede the "s" entry. Likewise, the "^" character, if present, must follow everything except the "f" character(s). The format characters i, d, x, h, u, c, s, ld, li, lx, and lu control the output format for the data. The i and d format characters perform identical functions, they tell printf to print the following value as a 16-bit signed decimal integer. The x and h format characters instruct printf to print the specified value as a 16-bit or 8-bit hexadecimal value (respectively). If you specify u, printf prints the value as a 16-bit unsigned decimal integer. Using c tells printf to print the value as a single character. S tells printf that you're supplying the address of a zero-terminated character string, printf prints that string. The ld, li, lx, and lu entries are long (32-bit) versions of d/i, x, and u. The corresponding address points at a 32-bit value which printf will format and print to the standard output. The following example demonstrates these format items: printf db "I= %i, U= %u, HexC= %h, HexI= %x, C= %c, " db "S= %s",13,10 db "L= %ld",13,10,0 dd i,u,c,i,c,s,l The number of far addresses (specified by operands to the "dd" pseudo-opcode) must match the number of "%" format items in the format string. Printf counts the number of "%" format items in the format string and skips over this many far addresses following the format string. If the number of items do not match, the return address for printf will be incorrect and the program will probably hang or otherwise malfunction. Likewise (as for the print routine), the format string must end with a zero byte. The addresses of the items following the format string must point directly at the memory locations where the specified data lies. When used in the format above, printf always prints the values using the minimum number of print positions for each operand. If you want to specify a minimum field width, you can do so using the "n" format option. A format item of the format "%10d" prints a decimal integer using at least ten print positions. Likewise, "%16s" prints a string using at least 16 print positions. If the value to print requires more than the specified number of print positions, printf will use however many are necessary. If the value to print requires fewer, printf will always print the specified number, padding the value with blanks. Printf will print the value right justified in the print field (regardless of the data's type). If you want to print the value left justified in the output file, use the "-" format character as a prefix to the field width, e.g., printf db "%-17s",0 dd string In this example, printf prints the string using a 17 character long field with the string left justified in the output field. By default, printf blank fills the output field if the value to print requires fewer print positions than specified by the format item. The "\c" format item allows you to change the padding character. For example, to print a value, right justified, using "*" as the padding character you would use the format item "%\*10d". To print it left justified you would use the format item "%-\*10d". Note that the "-" must precede the "\*". This is a limitation of the current version of the software. The operands must appear in this order. Normally, the address(es) following the printf format string must be far pointers to the actual data to print. On occassion, especially when allocating storage on the heap (using malloc), you may not know (at assembly time) the address of the object you want to print. You may have only a pointer to the data you want to print. The "^" format option tells printf that the far pointer following the format string is the address of a pointer to the data rather than the address of the data itself. This option lets you access the data indirectly. Examples: printf db "Indirect access to i: %^d",13,10,0 dd IPtr ; printf db "A string allocated on the heap: %-\.32^s" db 13,10,0 dd SPtr Note: unlike C, stdlib's printf routine does not support floating point output. There are two reasons for this: first, stdlib does not (yet) have a floating point library associated with it; second, adding floating point support would increase the size of printf by a tremendous amount, even if you don't use its floating point capabilities. Since most assembly language programmers don't use floating point arithmetic, I've intentionally left out floating point output. As soon as I add a floating point package to stdlib I will include floating point output. However, I will create a new routine, printff which includes floating point output. This will allow those who never use floating point I/O to keep their programs much smaller. 3 Character Input Routines 3.1 Getc * Reads a character from the standard input device and returns the character in the AL register. * Redirectable under program control. Inputs: None. Outputs: AL- Character from input device. AH- Undefined. However, if AL contains zero, AH should contain a keyboard scan code. Include: Stdlib.a This routine reads a character from the standard input device. This call is synchronous, that is, it does not return until a character is available. Default input device is DOS standard input. Example: getc mov KbdChar, al putc 3.2 GetcStdIn * Reads a character from the DOS standard input device and returns the character in the AL register. * Redirectable from DOS command line. Inputs: None. Outputs: AL- Character from input device. AH- Scan code if AL=0. Include: Stdlib.a This routine reads a character from the DOS standard input device. This call is synchronous, that is, it does not return until a character is available. Example: GetcStdIn mov InputChr, al putc 3.3 GetcBIOS * Reads a character from the keyboard and returns the character in the AL register and the scan code in the AH register. Inputs: None. Outputs: AL- Character from the keyboard. AH- Scan code from the keyboard. Include: Stdlib.a This routine reads a character from the keyboard. This call is synchronous, that is, it does not return until a character is available. Example: GetcBIOS mov CharRead, al mov ScanCode, ah putc 3.4 SetInAdrs * Lets you set the address of the current input routine. Inputs: es:di- Address of new input routine. Outputs: None. Include: stdlib.a This routine redirects the stdlib standard input so that it calls the routine whose address you pass in es:di. This routine should obtain a character (from anywhere) and return the character in AL. If it makes sense do do so, it should also return a "scan code" in the AH register. It must preserve all other registers. Example: mov es, seg NewInputRoutine mov di, offset NewInputRoutine SetInAdrs . . . les di, RoutinePtr SetInAdrs 3.5 GetInAdrs * Retrieves address of the current input routine. Inputs: None. Outputs: es:di - address of current input routine (called by Getc). Include: stdlib.a You can use this function to get the address of the current input routine, perhaps so you can save it or see if it is currently pointing at some particular piece of code. If you want to temporarily redirect the input and then restore the original input routine, consider using PushInAdrs/PopInAdrs described later. Example: GetInAdrs mov word ptr SaveInAdrs, di mov word ptr SaveInAdrs+2, es 3.6 PushInAdrs * Lets you redirect the standard input device and preserve the previous address. * Saves up to 16 old input routine addresses on an internal stack. * Restoration is possible using PopInAdrs. Inputs: es:di- Address of new input routine. Outputs: Carry=0 if operation successful. Carry=1 if there were already 16 items on the stack. Include: stdlib.a This routine "pushes" the current input address onto an internal stack and then stores the value in es:di into the current input routine pointer. The PushInAdrs and PopInAdrs routines let you easily save and redirect the standard output and then restore the original output routine address later on. If you attempt to push more than 16 items on the stack, PushInAdrs will ignore your request and return with the carry flag set. If PushInAdrs is successful, it will return with the carry flag clear. Example: mov es, seg NewInputRoutine mov di, offset NewInputRoutine PushInAdrs . . . les di, RoutinePtr PushInAdrs 3.7 PopInAdrs * Restores output routine addresses saved by PushInAdrs. * Defaults to GetcStdOut if you attempt to pop too many items off the stack. Inputs: None. Outputs: es:di- Points at the previous stdout routine before the pop. Include: stdlib.a PopInAdrs undoes the effects of PushInAdrs. It pops an item off the internal stack and stores it into the input routine pointer. The previous value in the output pointer is returned in es:di. Example: mov es, seg NewInRoutine mov di, offset NewInputRoutine PushInAdrs . . . PopInAdrs 3.8 Gets * Reads a line of text from the stdlib standard input device. * Automatically allocates storage for the input string on the heap. * Handles input lines up to 256 characters long. Inputs: None. Outputs: es:di - address of input of text. Include: stdlib.a Gets reads a line of text from the stdlib standard input. It returns a pointer to a string containing each character read in the ES:DI registers. Gets calls malloc to allocate 256 bytes on the heap (plus any overhead bytes required by the memory manager system). If the user enters less than 256 bytes, gets calls realloc to free any unnecessary bytes. Gets returns all characters typed by the user except for the carriage return (ENTER) key code. Gets always returns a zero-terminated string. The action of various keys to gets depends upon where input has be directed. Generally, you can count on gets properly handling the backspace (erase previous character), escape (erase entire line), and ENTER (accept line) keys. Other keys may be active as well. For example, by default gets calls getc which calls DOS' standard input routine. If you type a control-C or break key while reading from DOS' standard input it will abort the program. If this bothers you, you can always redirect stdlib's getc routine so it calls BIOS directly rather than reading data through DOS' keyboard input routine. Example: gets ;Read a string from the keyboard puts ;Print it putcr ;Print a new line free ;Deallocate storage for string. 3.9 Scanf * Formatted input from stdlib standard input. * Similar to C's scanf routine. * Converts ASCII to integer, unsigned, character, string, hex, and long values of the above. Inputs: None. Outputs: None. Include: stdlib.a Scanf provides formatted input in a fashion analogous to printf's output facilities. Actually, it turns out that scanf is considerably less useful than printf because it doesn't provide reasonable error checking facilities (neither does C's version of this routine). But for quick and dirty programs whose input can be controlled in a rigid fashion (or if you're willing to live by "garbage in, garbage out") scanf provides a convenient way to get input from the user. Like printf, the scanf routine expects you to follow the call with a format string and then a list of (far pointer) memory addresses. The items in the scanf format string take the following form: %^f where f represents d, i, x, h, u, c, x, ld, li, lx, or lu. Like printf, the "^" symbol tells scanf that the address following the format string is the address of a (far) pointer to the data rather than the address of the data location itself. By default, scanf automatically skips any leading whitespace before attempting to read a numeric value. You can instruct scanf to skip other characters by placing that character in the format string. For example, the following call instructs scanf to read three integers separated by commas (and/or whitespace): scanf db "%i,%i,%i",0 dd i1,i2,i3 Whenever scanf encounters a non-blank character in the format string, it will skip that character (including multiple occurrences of that character) if it appears next in the input stream. Scanf always calls gets to read a new line of text from stdlib's standard input. If scanf exhausts the format list, it ignores any remaining characters on the line. If scanf exhausts the input line before processing all of the format items, it leaves the remaining variables unchanged. Scanf always deallocates the storage allocated by gets. Example: scanf db "%i %h %^s",0 dd i, x, sptr 4 Conversion Routines 4.1 ATOL/ATOL2 * Converts an ASCII string of digits to long integer format. Inputs: ES:DI- Points at string to convert. Outputs: DX:AX- Long integer converted from string. Carry flag- Error status DI (ATOL2)- First character beyond string of digits. Include: stdlib.a ATOL convert the string of digits that ES:DI points at to a long integer (signed) value and returns this value in DX:AX. ATOL2 works in a similar fashion except it doesn't preserve the DI register. That is, it leaves DI pointing at the first character beyond the string of digits. This routine returns the carry flag clear if it translated the string of digits witout error. It returns the carry flag set if overflow occurred. Note that this routine stops on the first non-digit. If the string does not begin with a digit, this routine returns zero. The only except to the "string of digits" rule is that the number can have a preceding minus sign to denote a negative number. In particular, note that this routine does not allow leading spaces. Example: gets ;Get a string from user atol ;Convert to a value in DX:AX 4.2 ATOUL/ATOUL2 Just like ATOL above, except this guy handles unsigned long integers. 4.3 ATOI * Converts an ASCII string of digits to integer format. Inputs: ES:DI- Points at string to convert. Outputs: AX- Integer converted from string. Carry flag- Error status DI (ATOI2)- First character beyond string of digits. Include: stdlib.a Works just like ATOL except it translates the string to a signed 16-bit integer rather than a 32-bit long integer. 4.4 ATOU/ATOU2 * Converts an ASCII string of digits to unsigned integer format. Inputs: ES:DI- Points at string to convert. Outputs: AX- Unsigned 16-bit integer converted from string. Carry flag- Error status DI (ATOU2)- First character beyond string of digits. Include: stdlib.a Like ATOI except it handle unsigned 16-bit integers in the range 0..65535. 4.5 ATOH/ATOH2 * Converts an ASCII string of hex digits to a value in AX. Inputs: ES:DI- Points at string to convert. Outputs: AX- Unsigned 16-bit integer converted from hex string. Carry flag- Error status DI (ATOH2)- First character beyond string of hex digits. Include: stdlib.a This routine converts a string of hexadecimal digits into numeric form and returns that value in the AX register. Example: les di, Str2Convrt atoh ;Convert to value in AX. putw ;Print word in AX. 4.6 ATOLH/ATOLH2 Like ATOH above, except it handles 32-bit values and returns the result in DX:AX. 4.7 ITOA * Converts a 16-bit signed integer value in AX to a string of characters. * Automatically allocates storage for string on the heap. Inputs: AX- Signed 16-bit value to convert to a string. Outputs: ES:DI- Pointer to string containing converted characters. Include: stdlib.a ITOA converts the signed integer value in AX to a string of characters which represent that value. It allocates storage for this string on the heap via a call to the malloc routine and returns a pointer to that string in ES:DI. The string contains the minimum number of characters required to hold the character representation of the value and is always between one and six characters long. Example: mov ax, -1234 itoa ;Convert to string. puts ;Print it. free ;Deallocate string. 4.8 UTOA * Converts a 16-bit unsigned integer value in AX to a string of characters. * Automatically allocates storage for string on the heap. Inputs: AX- Unsigned 16-bit value to convert to a string. Outputs: ES:DI- Pointer to string containing converted characters. Include: stdlib.a Like ITOA above, except it converts the unsigned value in AX to a string of characters. The string returned by UTOA is always one to five characters long. Example: mov ax, 65000 utoa puts free 4.9 HTOA * Converts an 8-bit value in AL to the two-character hexadecimal representation of that byte. * Automatically allocates storage for string on the heap. Inputs: AL- 8-bit value to convert to a string. Outputs: ES:DI- Pointer to string containing converted characters. Include: stdlib.a Converts a byte to a string containing the hexadecimal representation of that byte. Otherwise, it's just like ITOA above. This routine always outputs exactly two hexadecimal digits, including a leading zero (if necessary). 4.10 WTOA * Converts a 16-bit value in AX to hexadecimal representation. * Automatically allocates storage for string on the heap. Inputs: AX- 16-bit value to convert to a string. Outputs: ES:DI- Pointer to string containing converted characters. Include: stdlib.a Like HTOA above, except it converts the 16-bit value in AX to a string of four hexadecimal digits. Outputs exactly four digits including leading zeros if necessary. 4.11 LTOA * Converts a 32-bit signed integer value in DX:AX to a string of characters. * Automatically allocates storage for string on the heap. Inputs: DX:AX- Signed 32-bit value to convert to a string. Outputs: ES:DI- Pointer to string containing converted characters. Include: stdlib.a Like ITOA except it converts a long integer value in DX:AX to a string of one to eleven characters. 4.12 ULTOA * Converts a 32-bit unsigned integer value in DX:AX to a string of characters. * Automatically allocates storage for string on the heap. Inputs: DX:AX- Unsigned 32-bit value to convert to a string. Outputs: ES:DI- Pointer to string containing converted characters. Include: stdlib.a Like LTOA except this guy handles unsigned integer values. 4.13 SPrintf * In-memory formatting routine. * Just like C's sprintf routine. * Automatically allocates storage for the string on the heap. * Programmer selectable maximum length for the output string. Inputs: CS:RET- Pointer to format string and operands of the sprintf routine. Outputs: ES:DI- Pointer to string containing output data. Include: stdlib.a Works in a manner quite similar to printf except sprintf writes its output to a string variable rather than to the stdlib standard output. Sprintf returns a pointer to the string (which is allocates on the heap) in the ES:DI registers. SPrintf, by default, allocates 2048 characters for this string and then deallocates any unnecessary storage. An external variable, sp_MaxBuf, holds the number of bytes to allocate upon entry into sprintf. If you wish to allocate more or less than 2048 bytes when calling sprintf, simply change the value of this public variable (type is word). Sprintf calls malloc to allocate the storage dynamically. You should call free to return this buffer to the heap when you are through with it. Example: sprintf db "I=%i, U=%u, S=%s",13,10,0 db i,u,s puts free 4.14 SBPrintf * In-memory formatting routine. * Programmer-supplied output buffer for string Inputs: CS:RET- Pointer to format string and operands of the sprintf routine. ES:DI- Pointer to buffer area to store string data. Outputs: None. Include: stdlib.a Works just like sprintf except it does not automatically allocate storage for the output string. Instead, you must supply the address of an output buffer in the ES:DI registers. Example: les di, BufferAdrs sbprintf db "I=%i, U=%u, S=%s",13,10,0 db i,u,s puts 4.15 SScanf * Formatted in-memory conversions. * Similar to C's sscanf routine. * Converts ASCII to integer, unsigned, character, string, hex, and long values of the above. Inputs: ES:DI- Points at string containing values to convert. CS:RET- Points at format string and variable parameter list. Outputs: None. Include: stdlib.a Sscanf provides formatted input in a fashion analogous to scanf. The difference is that scanf reads a line of text from the stdlib standard input whereas you pass the address of a sequence of characters to sscanf in es:di. Example: ; ; This code reads the values for i, j, and s from the characters ; starting at memory locaiton Buffer. ; les di, Buffer sscanf db "%i %i %s",0 dd i,j,s 4.16 ToLower * Converts uppercase characters in AL to lower case. * Macro implementation for high performance. * Leaves characters other than uppercase unchanged. Inputs: AL- Character to (possibly) convert to lower case. Outputs: AL- Converted character. Include: stdlib.a ToLower checks the character in the AL register. If it is upper case it converts it to lower case. If it is anything else, ToLower leaves the value in AL unchanged. Note: this routine is implemented as a macro rather than as a procedure call. This routine is so short you would spend more time actually calling the routine than executing the code inside. However, the code is definitely longer than a (far) procedure call, so if space is critical and you're invoking this code several times, you may want to convert it to a procedure call to save a little space. Example: mov al, char ToLower 4.17 ToUpper * Converts lowercase characters in AL to upper case. * Macro implementation for high performance. * Leaves characters other than lowercase unchanged. Inputs: AL- Character to (possibly) convert to upper case. Outputs: AL- Converted character. Include: stdlib.a This is just like the ToLower routine except it converts lower case to uppercase rather than vice versa. 5 Utility Routines 5.1 ISize * Computes the number of print positions required by a 16-bit signed integer value. Inputs: AX- 16-bit value to compute the output size for. Outputs: AX- Number of print positions required by this number (including the minus sign, if necessary). Include: stdlib.a ISize computes the minimum number of character positions it will take to print the signed decimal value in the AX register. If the number is negative, it will include space for the minus sign in the count. Example: mov ax, I ISize puti ;Prints positions req'd by I. 5.2 USize Just like ISize above, except this guy returns the number of print positions required by a 16-bit unsigned value. 5.3 LSize * Computes the number of print positions required by a 32-bit signed integer value. Inputs: DX:AX- 32-bit value to compute the output size for. Outputs: AX- Number of print positions required by this number (including the minus sign, if necessary). Include: stdlib.a LSize computes the minimum number of character positions it will take to print the signed decimal value in the DX:AX registers. If the number is negative, it will include space for the minus sign in the count. Example: mov ax, word ptr L mov dx, word ptr L+2 LSize puti ;Prints positions req'd by L. 5.4 ULSize As with LSize, except ULSize treats the value in DX:AX as an unsigned long integer. 5.5 IsAlNum * Checks character in AL to see if it is alphanumeric. Inputs: AL- Character to check. Outputs: Zero flag- Set if character is alphanumeric, clear if not. Include: stdlib.a This procedure checks the character in the AL register to see if it is in the range A-Z, a-z, or 0-9. Upon return, you can use the JE instruction to check to see if the character was in this range (or, conversely, you can use jne to see if it is not in the range). Example: mov al, char IsAlNum je IsAlNumChar 5.6 IsXDigit * Checks character in AL to see if it is a hexadecimal digit. Inputs: AL- Character to check. Outputs: Zero flag- Set if character is a hex digit, clear if not. Include: stdlib.a This procedure checks the character in the AL register to see if it is in the range A-F, a-f, or 0-9. Upon return, you can use the JE instruction to check to see if the character was in this range (or, conversely, you can use jne to see if it is not in the range). Example: mov al, char IsXDigit je IsXDigitChar 5.7 IsDigit * Checks character in AL to see if it is numeric. * Macro implementation for high performance. Inputs: AL- Character to check. Outputs: Zero flag- Set if character is numeric, clear if not. Include: stdlib.a This procedure checks the character in the AL register to see if it is in the range 0-9. Upon return, you can use the JE instruction to check to see if the character was in this range (or, conversely, you can use jne to see if it is not in the range). Example: mov al, char IsDigit je IsDecChar 5.8 IsAlpha * Checks character in AL to see if it is alphabetic. * Macro implementation for high performance. Inputs: AL- Character to check. Outputs: Zero flag- Set if character is alphabetic, clear if not. Include: stdlib.a This procedure checks the character in the AL register to see if it is in the range A-Z, or a-z. Upon return, you can use the JE instruction to check to see if the character was in this range (or, conversely, you can use jne to see if it is not in the range). Example: mov al, char IsAlpha je IsAlChar 5.9 IsLower * Checks character in AL to see if it is a lower case alphabetic character. * Macro implementation for high performance. Inputs: AL- Character to check. Outputs: Zero flag- Set if character is lower case alphabetic, clear if not. Include: stdlib.a This procedure checks the character in the AL register to see if it is in the range a-z. Upon return, you can use the JE instruction to check to see if the character was in this range (or, conversely, you can use jne to see if it is not in the range). Example: mov al, char IsLower je IsLowerChar 5.10 IsUpper * Checks character in AL to see if it is uppercase alphabetic. * Macro implementation for high performance. Inputs: AL- Character to check. Outputs: Zero flag- Set if character is uppercase alpha, clear if not. Include: stdlib.a This procedure checks the character in the AL register to see if it is in the range A-Z. Upon return, you can use the JE instruction to check to see if the character was in this range (or, conversely, you can use jne to see if it is not in the range). Example: mov al, char IsUpper je IsUpperChar 6 Memory Management The stdlib memory management routines let you dynamically allocate storage on the heap. These routines are somewhat similar to those provided by the "C" programming language. These routines do not perform garbage collection. Doing so would introduce too many restrictions. Of course, feel free to add your own garbage collection if you like... The allocation/deallocation routines should be fairly fast. Malloc and free use a modified first/next fit algorithm which lets the system quickly find a memory block of the desired size without undue fragmentation problems (average case). The overhead (eight bytes) per allocated block may seem rather high, but that is part of the price to pay for faster malloc and free routines. The memory manager data structure has an overhead of eight bytes (meaning each malloc operation requires at least eight more bytes than you ask for) and a granularity of 16 bytes. All pointers are far pointers and I allocate each new item on a paragraph boundary. The current memory manager routines always allocates (n+8) bytes, rounding up to the next multiple of 16 if the result is not evenly divisible by sixteen. The first eight bytes of the structure are used by the memory management routines, the remaining bytes are available for use by the caller (malloc, et. al., return a pointer to the first byte beyond the memory management overhead structure). Of course, you should never count on any of this stuff. I could rewrite the memory manager tomorrow and if you use the interface which follows your code will still work properly. If you make assumptions about the structure of the memory management record, your code may go up in flames on the next revision. 6.1 MemInit * Initializes memory manager system. Inputs: DX- Number of paragraphs to reserve. zzzzzzseg- Segment name of last segment in your program. PSP- Public word variable which holds the PSP value for your program. Outputs: CX- Number of paragraphs actually reserved by MemInit Carry=0 if no error. If carry=1, AX contains DOS error code. Include: stdlib.a This routine initializes the memory manager system. You must call it before using any routines which call any of the memory manager procedures (since a good number of the stdlib routines call the memory manager, you should get it the habit of always calling this routine. The system will die a horrible death if you call a memory manager routine (like malloc) without first calling MemInit. This routine excepts you to define (and set up) two global names: zzzzzzseg and PSP. "zzzzzzseg" is a dummy segment which must be the name of the very last segment defined in your program. MemInit uses the name of this segment to determine the address of the last byte in your program. If you do not declare this segment last, the memory manager will happily wipe out anything which follows zzzzzzseg. The "shell.asm" file provides you with a template for your programs which properly defines this segment. PSP should be a word variable which contains the program segment prefix value for your program. MS-DOS passes the PSP value to your program in the DS and ES registers. You should save this value in the PSP variable. Don't forget to make PSP a public symbol in your main program's source file. The "shell.asm" file demonstrates how to properly set up this value. The DX register contains the number of 16-byte paragraphs you want to reserve for the heap. If DX contains zero, MemInit will allocate all of the available memory to the heap. If your program is going to allow the user to run a copy of the command interpreter, or if your program is going to EXEC some other program, you should not allocate all storage to the heap. Instead, you should reserve some memory for those programs. By setting DX to some value other than zero, you can tell MemInit how much memory you want to reserve for the heap. All left over memory will be available for other system (or program) use. If the value is DX is larger than the amount of available RAM, MemInit will split the available memory in half and reserve half for the heap leaving the other half unallocated. If you want to force this situation (to leave half of available memory for other purposes), simply load DX with 0FFFFh before calling MemInit. There will never be this much memory available, so this will force MemInit to split the available RAM between the heap and unallocated storage. On return from MemInit, the CX register contains the number of paragraphs actually allocated. You can use this value to see if MemInit has actually allocated the number of paragraphs you requested. You can also use this value to determine how much space is available when you elect to split the free space between the heap and the unallocated portions. If all goes well, this routine returns the carry flag clear. If a DOS memory manager error occurs, this routine returns the carry flag set and the DOS error code in the AX register. Example: ; ; Don't forget to set up PSP and zzzzzzseg before calling MemInit. ; mov dx, dx ;Allocate all available RAM MemInit jc MemoryError ; ; cx contains the number of paragraphs actually allocated. ; 6.2 Malloc * Allocates storage from the heap. * Allocates blocks up to 64K long. * Very fast combination first/next fit allocation strategy Inputs: CX- Number of bytes to reserve. Outputs: CX- Number of bytes actually reserved by malloc. ES:DI- Pointer to first byte of memory allocated by malloc. Carry=0 if no error. Carry=1 if insufficient memory Include: stdlib.a Malloc is the workhorse routine you use to allocate a block of memory. You give it the number of bytes you need and if it finds a block large enough, it will allocate the requested amount and return a pointer to that block. Most memory managers require a small amount of overhead for each block they allocate. Stdlib's (current) memory manager requires an overhead of eight bytes. Furthermore, the grainularity is 16 bytes. This means that malloc always allocates blocks of memory in paragraph multiples. Therefore, malloc may actually reserve more storage than you specify. Therefore, the value returned in CX may be somewhat greater than the requested value. By setting the minimum allocation size to a paragraph, I was able to reduce the overhead and improve the speed of malloc by a considerable amount. Stdlib's memory management system does not do any garbage collection. Doing so would place too many demands on malloc's users. Therefore, it is quite possible for you to fragment memory with multiple calls to malloc, realloc, and free. You could wind up in a situation where there is enough free memory to satisfy your request, but there isn't a single contiguous block large enough for the request. Malloc treats this as an insufficient memory error and returns with the carry flag set. If malloc cannot allocate a block of the requested size, it returns with the carry flag set. In this situation, the contents of ES:DI is undefined. Attempting to dereference this pointer will produce erratic and, perhaps, disasterous results. Example: mov cx, 256 malloc jnc GoodMalloc print db "Insufficient memory to continue.",cr,lf,0 jmp Quit GoodMalloc: mov es:[di], 0 ;Init string to NULL. 6.3 Realloc * Reallocates a block of memory on the heap. * Allocates blocks up to 64K long. * Allows you to make the new block smaller or larger than the original block. * Automatically copies the data from the original block to the new block if the new block is larger than the old block. Inputs: CX- Number of bytes to reserve. ES:DI- Pointer to block to reallocate. Outputs: CX- Number of bytes actually reserved by realloc. ES:DI- Pointer to first byte of memory allocated by realloc. Carry=0 if no error. Carry=1 if insufficient memory Include: stdlib.a Realloc lets you change the size of an allocated block in the heap. It allows you to make the block larger or smaller. If you make the block smaller, realloc simply frees (returns to the heap) any leftover bytes at the end of the block. If you make the block larger, realloc goes out and allocates a block of the requested size, copies the bytes from the old block to the beginning of the new block (leaving the bytes at the end of the new block uninitialized), and then frees the old block. 6.4 Free * Deallocates a block of memory on the heap. * Automatically coalesces all contiguous, unused, blocks on the heap. * Very fast algorithm. * Handles the situation where several active pointers may still point at the specified block. Inputs: ES:DI- Pointer to block to deallocate. Outputs: Carry=0 if no error. Carry=1 if es:di doesn't point at a free block. Include: stdlib.a Free (possibly) deallocates storage allocated on the heap by malloc or realloc. Free returns this storage to the heap so other code can reuse it later. Note, however, that free doesn't always return storage to the heap. The memory manager data structure keeps track of the number of pointers currently pointing at a block on the heap (see DupPtr, below). If you've set up several pointers such that they point at the same block, free will not deallocate the storage until you've freed all of the pointers which point at that block. Free usually returns an error code (carry flag = 1) if you attempt to free a block which is not currently allocated or if you pass it a memory address which was not returned by malloc (or realloc). By no means is this routine totally robust. If you start calling free with arbitrary pointers in es:di (which happen to be pointing into the heap) it is possible, under certain circumstances, to confuse free and it will attempt to free a block it really shouldn't. I could fix this problem by adding a lot of (slow) code to the free routine. However, this library is for assembly language programmers. People who are supposed to know what they are doing. Therefore, I opted to sacrifice a little safety for a lot of speed. Example: les di, HeapPtr free 6.5 DupPtr * Informs the memory manager that you have more than one active pointer pointing at a block of memory. * Prevents free from deallocating storage to a block while there are still some active pointers to that block. Inputs: ES:DI- Pointer to block. Outputs: Carry=0 if no error. Carry=1 if es:di doesn't point at a free block. Include: stdlib.a DupPtr increments the pointer count for the block at the specified address. Malloc sets this counter to one. Free decrements it by one. If free decrements the value and it becomes zero, free will release the storage to the heap for other use. By using DupPtr you can tell the memory manager that you have several pointers pointing at the same block and that it shouldn't deallocate the storage until you free all of those pointers. Example: les di, Ptr DupPtr 6.6 IsInHeap * Tells you if a pointer contains the address of a byte in the heap. Inputs: ES:DI- Pointer to block. Outputs: Carry=0 if es:di points into the heap. Carry=1 if not. Include: stdlib.a This routine lets you know if es:di contains the address of a byte in the heap somewhere. It does not tell you if es:di contains a valid pointer returned by malloc (see IsPtr, below). For example, if es:di contains the address of some particular element of an array (not necessarily the first element) allocated on the heap, IsInHeap will return with the carry clear denoting that the es:di point somewhere in the heap. Keep in mind, that calling this routine does not validate the pointer. It could be pointing at a byte which is part of the memory manager data structure rather than at actual data (since the memory manager maintains that information within the bounds of the heap). This routine is mainly useful for seeing if something is allocated on the heap as opposed to somewhere else (like your code, data, or stack segment). 6.7 IsPtr * Tells you if a pointer contains the address of the start of a block in the heap. Inputs: ES:DI- Pointer to block. Outputs: Carry=0 if es:di is a valid pointer. Carry=1 if not. Include: stdlib.a IsPtr is much more specific than IsInHeap. This guy returns the carry flag clear if and only if es:di contains the address of a properly allocated (and currently allocated) block on the heap. This pointer must be a value returned by malloc, realloc, or DupPtr and that block must be currently allocated for IsPtr to return the carry flag clear. 7 String Routines The stdlib string package supports "C" style zero-terminated strings. Most of these routines mirror their "C" counterpart. Of course, I've added a few additional routines which seem useful to me. 7.1 Strcpy, Strcpyl * Copies a zero terminated string from one buffer to another. * Does not require the use of the DS segment register. Inputs: ES:DI- Pointer to source string (Strcpy only). CS:RET- Pointer to source string (Strcpyl only). DX:SI- Pointer to destination string. Outputs: ES:DI- Points at the destination string. Include: stdlib.a Strcpy is used to copy a zero-terminated string from one location to another. ES:DI points at the source string, DX:SI points at the destination address. Strcpy copies all bytes, up to and including the zero byte, from the source address to the destination address. The target buffer must be large enough to hold the string. Strcpy performs no error checking on the size of the destination buffer. Strcpyl copies the zero-terminated string immediately following the call instruction to the destination address specified by DX:SI. Again, this routine expects you to ensure that the target buffer is large enough to hold the result. Examples: mov dx, seg target mov si, offset target Strcpyl db "String for Strcpyl",0 ; ; Copy that string to Target2 as well, note that ES:DI already points ; at "Target". ; mov dx, seg Target2 mov si, offset Target2 Strcpy 7.2 StrDup, StrDupl * Duplicates a string by copying a zero-terminated string from one location to a newly allocated spot on the heap. * Automatically allocates sufficient storage for destination string on the heap. * Does not require the use of the DS segment register. Inputs: ES:DI- Pointer to source string (Strdup only). CS:RET- Pointer to source string (Strdupl only). Outputs: ES:DI- Points at the destination string allocated on heap. Carry=0 if operation successful. Carry=0 if insufficient memory for new string. Include: stdlib.a Strdup and strdupl duplicate strings. You pass them a pointer to the string (in es:di for strdup, via the return address for strdupl) and they allocate sufficient storage on the heap for a copy of this string. Then these two routines copy their source strings to the newly allocated storage and return a pointer to the new string in ES:DI. Examples: Strdupl db "String for Strdupl",0 jc MallocError mov word ptr Dest1, di mov word ptr Dest1+2, es ; ; Create another copy of this string. Note that es:di points at ; Dest1 upon entry to Strdup, but it points at the new string on ; exit. ; Strdup jc MallocError mov word ptr Dest2, di mov word ptr Dest2+2, es 7.3 Strlen * Computes the length of a zero terminated string. Inputs: ES:DI- Pointer to source string. Outputs: CX- Length of specified string. Include: stdlib.a Strlen computes the length of the string whose address appears in ES:DI. It returns the number of characters up to, but not including, the zero terminating byte. Example: les di, String strlen mov sl, cx printf db "Length of '%s' is %d\n",0 dd String, sl 7.4 Strcat, Strcat2, Strcatl, Strcat2l * Concatenates one string to the end of another. * Strcatl and Strcat2l allow literal string operands. * Strcat2 and Strcat2l automatically allocate storage for destination string. Inputs: ES:DI- Pointer to first string. DX:SI- Pointer to second string (Strcat & Strcat2 only). Outputs: ES:DI- Pointer to new string (Strcat2 & StrCat2l only). Carry=0 No error. Carry=1 Insufficient memory (Strcat2 & StrCat2l only). Include: stdlib.a These routines concatenate two strings together. They differ mainly in the location of their source and destination operands. Strcat concatenates the string pointed at by DX:SI to the end of the string pointed at by ES:DI in memory (both strings must be zero-terminated). The buffer pointed at by ES:DI must be large enough to hold the resulting string. Strcat performs no bounds checking on the data. Strcat2 works just like strcat except it does not append the second string on to the end of the first. Instead, Strcat2 computes the length of the two strings and attempts to allocate this much storage on the heap. If it is unsuccessful, Strcat2 returns with the carry flag set. If it successfully allocates this storage on the heap, it copies the string pointed at by es:di to the heap and then concatenates the string dx:si points at to the end of this string on the heap and returns with the carry flag clear and es:di pointing at the new string on the heap. Strcatl and Strcat2l work just like Strcat and Strcat2 except you supply the second string as a literal constant immediately after the call rather than pointing dx:si at it. Examples: les di, String1 mov dx, seg String2 lea si, String2 Strcat ;String1 <- String1 + String2 ; les di, String1 Strcatl db "Appended String",0 ; les di, String1 mov dx, seg String2 lea si, String2 Strcat2 ;NewString<-String1+String2 puts free ; les di, String1 Strcat2l db "Appended String",0 puts free 7.5 Strchr * Searches for a single character inside a string. Inputs: ES:DI- Pointer to string. AL- Character to search for. Outputs: CX- Position (starting at zero) where Strchr found the character. Carry=0 if Strchr found the character. Carry=1 if the character wasn't present in the string. Include: stdlib.a Strchr locates the first occurrence of a character within a string. It searches through the zero-terminated string pointed at by es:di for the character passed in AL. If it locates the character, it returns the position of that character in the CX register. The first character in the string corresponds to location zero. If the character is not in the string, Strchr returns the carry flag set. CX's value is undefined in that case. If Strchr locates the character in the string, it returns with the carry flag clear. Example: les di, String mov al, Char2Find strchr jc NotPresent mov CharPosn, cx 7.6 Strstr, Strstrl * Searches for a substring inside another string. Inputs: ES:DI- Pointer to string. DX:SI- Pointer to substring (strstr). CS:RET- Pointer to substring (strstrl). Outputs: CX- Position (starting at zero) where Strstr/Strstrl found the character. Carry=0 if Strstr/Strstrl found the character. Carry=1 if the character wasn't present in the string. Include: stdlib.a Strstr searches for the position of a substring within another string. ES:DI points at the string to search through, DX:SI points at the substring. Strstr returns the index into ES:DI's string where DX:SI's string is found. If the string is found, Strstr returns with the carry flag clear and CX contains the (zero-based) index into the string. If Strstr cannot locate the substring within the string ES:DI points at, it returns the carry flag set. Strstrl works just like Strstr except it expects the substring to search for immediately after the call instruction (rather than passing this address in DX:SI). Examples: les di, MainString lea si, Substring mov dx, seg Substring strstr jc NoMatch mov i, cx printf db "Found the substring '%s' at location %i\n",0 dd Substring, i jmp Done ; NoMatch: print db "Could not find the substring.",cr,lf,0 Done: les di, MainString strstrl db "test",0 jc NoMatch2 print "Found 'test' in the string",cr,lf,0 jmp Done2 ; NoMatch2: print db "Did not find 'test' in the string",cr,lf,0 Done2: 7.7 Strcmp, Strcmpl * Compares two strings. * Reflects comparison in 8086 condition code flags. Inputs: ES:DI- Pointer to first string. DX:SI- Pointer to second string (strcmp). CS:RET- Pointer to substring (strcmpl). Outputs: CX- Position (starting at zero) where the two strings differ. Flags- hold the result of the comparison (should use unsigned branches). Include: stdlib.a Strcmp and strcmpl compare two strings. Strcmp compares the string which es:di points at to the string which dx:si points at. Strcmpl compares the string which es:di points at to the string immediately following the call instruction in the code stream. Strcmp(l) reflects the status of this comparison in the flags register. Immediately upon return from strcmp(l) you can use the unsigned jump instructions to test the comparison between the two strings. Also (upon return), the CX register contains the index into the strings where they are different (if the two strings are equal, Strcmp(l) returns with CX containing the offset of the zero byte in the two strings. Examples: les di, String1 mov dx, seg String2 lea si, String2 strcmp jae s1GEs2 mov i, cx printf db "String1 is less than String2 and they " db "differ at position %i\n",0 dd i ; les di, String3 strcmpl db "Hello",0 jbe S3BEHello ; 7.8 Stricmp, Stricmpl * Compares two strings ignoring differences in alphabetic case. * Reflects comparison in 8086 condition code flags. Inputs: ES:DI- Pointer to first string. DX:SI- Pointer to second string (stricmp). CS:RET- Pointer to substring (stricmpl). Outputs: CX- Position (starting at zero) where the two strings differ. Flags- hold the result of the comparison (should use unsigned branches). Include: stdlib.a Stricmp and Stricmpl work just like Strcmp and Strcmpl except that these two routines are case insenstive. Strcmp and Strcmpl treat "GETS" and "gets" as different strings. Stricmp and Stricmpl treat these two strings as equal. 7.9 Strupr, Strupr2 * Converts all of the lower case characters in a string to upper case. * Converts the characters in place (Strupr) or creates a new string on the heap for the converted string (Strupr2). Inputs: ES:DI- Pointer to string. Outputs: ES:DI- Pointer to new string on heap (Strupr2 only). Carry=1 if memory allocation error (Strupr2 only). Include: stdlib.a Strupr and Strupr2 convert the alphabetic characters in a string to upper case. You pass the address of the string containing the characters you want to convert in ES:DI. Strupr converts the characters in place. That is, it will actually modify the string you pass to it. Strupr2 first calls strdup to duplicate the string (on the heap) and then it converts the characters in this duplicate string to upper case, returning the pointer to the new string is ES:DI. Examples: les di, Str2Cnvrt strupr les di, Str2Cnvrt puts les di, Str2Cnvrt2 strupr2 puts free 7.10 Strlwr, Strlwr2 * Converts all of the upper case characters in a string to lower case. * Converts the characters in place (Strlwr) or creates a new string on the heap for the converted string (Strlwr2). Inputs: ES:DI- Pointer to string. Outputs: ES:DI- Pointer to new string on heap (Strlwr2 only). Carry=1 if memory allocation error (Strlwr2 only). Include: stdlib.a Strlwr and Strlwr2 convert the alphabetic characters in a string to lower case. You pass the address of the string containing the characters you want to convert in ES:DI. Strlwr converts the characters in place. That is, it will actually modify the string you pass to it. Strlwr2 first calls strdup to duplicate the string (on the heap) and then it converts the characters in this duplicate string to lower case, returning the pointer to the new string is ES:DI. Examples: les di, Str2Cnvrt strlwr les di, Str2Cnvrt puts les di, Str2Cnvrt2 strlwr2 puts free 7.11 Strset, Strset2 * Initializes all the characters in a string to a single value. * Automatically allocates storage on the heap for the string (Strset2 only). Inputs: ES:DI- Pointer to string (Strset only) AL- Character to initialize the string with. CX- Length of string (Strset2 only). Outputs: ES:DI- Pointer to new string on heap (Strset2 only). Carry=1 if memory allocation error (Strset2 only). Include: stdlib.a Strset and Strset2 initialize strings such that each element of the string contains the same value (passed in AL). Strset overwrites the data in an existing string, replacing the characters previously in the string. To use Strset, simply load ES:DI with the address of a string, load AL with the character you want to overwrite the string with, and then call Strset. Strset will replace each existing character (up to the zero terminating byte) of the string with the character in AL. Strset2 lets you create a brand-new string. You pass the initialization character in AL and the length of the string in CX. Strset2 allocates CX+1 bytes on the heap and initializes the first CX bytes to the value in AL. It stores a zero in the last memory location. Examples: lesi di, Str2Cnvrt mov al, '*' Strset ; mov al, '#' mov cx, 32 Strset2 puts free ; 7.12 Strspan, Strspanl * Allows you to skip over successive characters in a string. * Very compact implementation. Inputs: ES:DI- Pointer to string to scan. DX:SI- Pointer to character set (Strspan only). CS:RET- Pointer to character set (Strspanl only). Outputs: First position where Strspan(l) could not find a character in the attendant character set. Points at the zero terminating byte of the string if all of the characters in the string were present in the set. Include: stdlib.a Strspan(l) scans a string counting the number of characters which are present in a second string (which represents a character set). While each successive character in the source string is present in the character set, Strspan(l) advances past it. ES:DI points at a zero-terminated string of characters to check. DX:SI (strspan) or CS:RET (strspanl) points at another zero-terminated string containing the set of characters to compare against. While the character that ES:DI points at is present (anywhere) in the character set string, the routine advances to the next character and bumps a counter by one. Upon encountering a character which is not in the character set string, the routine terminates and returns the number of characters (i.e., an index into the string) where the mismatch occurred. Although strspan (and, especially, strspanl) is very compact and convenient to use, it is not particularly efficient. The character set routines described in the next section provide a much faster alternative at the expense of a little more space. Examples: les di, String mov dx, seg CharSet lea si, CharSet strspan mov i, cx printf db "The first char which is not in CharSet " db "occurs at position %d in String.\n",0 dd i ; les di, String db "aeiou",0 mov j, cx printf db "The first char which is not a vowel " db "occurs at position %d in String.\n",0 dd j 7.13 Strcspan, Strcspanl * Allows you to skip past characters in a string which are not members of a particular character set. Inputs: ES:DI- Pointer to string to scan. DX:SI- Pointer to character set (Strcspan only). CS:RET- Pointer to character set (Strcspanl only). Outputs: First position where Strcspan(l) found a character in the attendant character set. Points at the zero terminating byte of the string if none of the characters in the string were in the set. Include: stdlib.a These two routines work just like strspan and strspanl except they skip over characters which are not in the set rather than skipping over characters that are in the associated character set. 8 Character Set Routines The character set routines let you deal with groups of characters as a set rather than a string. A set is an unordered collection of objects where membership (presence or absence) is the only important quality. I designed the stdlib set routines to let you quickly check to see if an ASCII character is in a set, to quickly add characters to a set or remove characters from a set. These operations are the ones most commonly used on character sets. The other operations (like union, intersection, difference, etc.) are useful, but don't enjoy the popularity of use as the former routines. Therefore, I've optimized the data structure for sets to handle the membership and add/delete operations at the slight expense of the others. Character sets are implemented via bit vectors. A "1" bit means that an item is present in the set and a "0" bit means that the item is absent from the set. The most common implementation of a character set is to use 32 consecutive bytes, eight bits per, giving 256 bits (one bit for each character in the character set). While this makes certain operations (like assignment, union, intersection, etc.) fast and convenient. Other operations (membership, add/remove items), however, run much slower. Since these are the more important operations, I've chosen a different data structure to represent sets. A faster approach is to simply use a byte value for each item in the set. This offers a major advantage over the 32-bit scheme: for operations like membership it's very fast (since all you've got to do is index into an array and test the resulting value). It has two drawbacks: first, operations like set assignment, union, difference, etc., require 256 operations rather than 32. Second, it takes eight times as much memory. The first drawback, speed, is of little consequence. You'll rarely use the operations so affected, so the fact that they run a little slower will be of little consequence. Wasting 224 bytes is a problem however. Especially if you have a lot of character sets. The approach I've used is to allocate 272 bytes. The first eight bytes contain bit masks, 1, 2, 4, 8, 16, 32, 64, and 128. These masks tell you which bit in the following 264 bytes is associated with the set. This lets me pack eight sets into 272 bytes (34 bytes per character set). This provides almost the speed of the 256-byte set with only a two byte overhead. In the stdlib.a file there is a macro that lets you defined a group of character sets: set. You use the macro as follows: set set1, set2, set3, ..., set8 You must supply between one and eight labels in the operand field. These are the names of the sets you want to create. The set macro automatically attaches these labels to the appropriate mask bytes in the set. The actual bit patterns for the set begin eight bytes later (from each label). Therefore, the byte corresponding to chr(0) is staggered by one byte for each set (which explains the other eight bytes needed above and beyond the 256 required for the set). When using the set manipulation routines, you should always pass the address of the mask byte (i.e., the seg/offset of one of the labels above) to the particular set manipulation routine you're using. Passing the address of the structure created with the macro above will reference only the first set in the group. Note that you can use the set operations for fast pattern matching applications. The set membership operation, for example, is much faster than the strspan routine found in the string package. Proper use of character sets can produce a program which runs much faster than some of the equivalent string operations. 8.1 Createsets * Allocates storage for eight character sets on the stack. Inputs: None. Outputs: ES:DI- Pointer to eight sets. Carry=0 if no error. Carry=1 if insufficient memory to allocate storage for sets. Include: stdlib.a Createsets allocates 272 bytes on the heap. This is sufficient room for eight character sets. It then initializes the first eight bytes of this storage with the proper mask values for each set. Location es:0[di] gets set to 1, location es:1[di] gets 2, location es:2[di] gets 4, etc. The createsets routine also initializes all of the sets to the empty set by clearing all the bits to zero. Example: createsets jc NoMemory mov word ptr SetPtr, di mov word ptr SetPtr+2, es ; 8.2 EmptySet * Clears all of the bits for a particular set to zero. Inputs: ES:DI- pointer to first byte of desired set. Outputs: None. Include: stdlib.a Emptyset clears out the bits in a character set to zero (thereby setting it to the empty set). Upon entry, es:di must point at the first byte of the character set you want to clear. Note that this is not the address returned by createsets. The first eight bytes of a character set structure are the addresses of eight different sets. ES:DI must point at one of these bytes upon entry into emptyset. Example: les di, SetPtr add di, 3 ;Point at 4th set in group. emptyset ; 8.3 RangeSet * Adds all of the elements between two values to a set. Inputs: ES:DI- pointer to first byte of desired set. AL- Lower bounds for range of items. AH- Upper bound for range (must be greater than AL). Outputs: None. Include: stdlib.a Rangeset adds in (via a UNION operation) to a set a range of values. Example: les di, SetPtr add di, 4 ;Point at 5th set in group. mov al, 'A' ;Add in the alphabetic chars mov ah, 'Z' rangeset ; 8.4 Addstr, Addstrl * Adds all of the characters from a string to a set. Inputs: ES:DI- pointer to first byte of desired set. DX:SI- pointer to string to add to set (Addstr only). CS:RET-pointer to string to add to set (Addstrl only). Outputs: None. Include: stdlib.a Addstr lets you add a group of characters to a set by specifying a string containing the characters you want in the set. To Addstr you pass a pointer to a zero-terminated string in dx:si. Addstr will add (union) each character from this string into the set. Addstrl lets you specify the string as a literal constant immediately after the call to addstrl. Example: les di, SetPtr add di, 1 ;Point at 2nd set in group. mov dx, seg CharStr ;Pointer to string containing lea si, CharStr ; chars to add to set. addstr ;Union in these characters. ; les di, SetPtr ;Point at first set in group. addstrl db "AaBbCcDdEeFf0123456789",0 ; 8.5 Rmvstr * Removes all of the characters in a string from a set. Inputs: ES:DI- pointer to first byte of desired set. DX:SI- pointer to string to remove from set (Rmvstr only). CS:RET-pointer to string to remove from set (Rmvstrl only). Outputs: None. Include: stdlib.a Rmvstr is the converse operation to Addstr. It removes from a set the characters appearing in the associated character string. Rmvstrl works the same way except you pass the string of characters immediately after the call rather than via a pointer in DX:SI. Example: les di, SetPtr add di, 1 ;Point at 2nd set in group. mov dx, seg CharStr ;Pointer to string containing lea si, CharStr ; chars to add to set. rmvstr ;Remove these characters. ; les di, SetPtr ;Point at first set in group. rmvstrl db "AaBbCcDdEeFf0123456789",0 ; 8.6 AddChar * Adds a single character to a set. Inputs: ES:DI- pointer to first byte of desired set. AL- character to add to the set. Outputs: None. Include: stdlib.a AddChar lets you add a single character (passed in AL) to a set. Example: les di, SetPtr add di, 1 ;Point at 2nd set in group. mov al, Ch2Add ;Character to add to set. addchar 8.7 RmvChar * Removes a single character from a set. Inputs: ES:DI- pointer to first byte of desired set. AL- character to remove from the set. Outputs: None. Include: stdlib.a RmvChar lets you remove a single character (passed in AL) from a set. Example: les di, SetPtr add di, 1 ;Point at 2nd set in group. mov al, Ch2Rmv ;Character to add to set. rmvchar 8.8 Member * Checks a character value to see if it is in the set.. Inputs: ES:DI- pointer to first byte of desired set. AL- character to check. Outputs: Zero flag=1 if character is in the set. Zero flag=0 if character is not in the set. Include: stdlib.a Member lets you check for set membership, that is, it lets you see if a character value is present in some set. This routine is probably the most-often called routine in the collection of set routines. Example: les di, SetPtr add di, 7 ;Point at 8th set in group. mov al, Ch2Chk ;Character to check. member je IsInSet ; 8.9 CopySet * Copies one set to another. Inputs: ES:DI- pointer to first byte of destination set. DX:SI- pointer to first byte of source set. Outputs: None. Include: stdlib.a CopySet copies the items from one set to another. This is a straight assignment not a union operation. After the operation the destination set is identical to the source set, both in terms of the element present in the set and absent from the set. Example: les di, SetPtr add di, 7 ;Point at 8th set in group. mov dx, seg SetPtr2 ;Point at first set in group. lea si, SetPtr2 copyset ; 8.10 SetUnion * Unions (adds) the members of one set into another. Inputs: ES:DI- pointer to first byte of destination set. DX:SI- pointer to first byte of source set. Outputs: None. Include: stdlib.a The SetUnion routine computes the union of two sets. That is, it adds all of the items present in a source set to a destination set. This operation preserves items present in the destination set before the SetUnion operation. Example: les di, SetPtr add di, 7 ;Point at 8th set in group. mov dx, seg SetPtr2 ;Point at first set in group. lea si, SetPtr2 unionset ; 8.11 SetIntersect * Computes the intersection of two sets. Inputs: ES:DI- pointer to first byte of destination set. DX:SI- pointer to first byte of source set. Outputs: None. Include: stdlib.a Setintersect computes the intersection of two sets, leaving the result in the destination set. The new set consists only of those items which previously appeared in both the source and destination sets. Example: les di, SetPtr add di, 7 ;Point at 8th set in group. mov dx, seg SetPtr2 ;Point at first set in group. lea si, SetPtr2 setintersect ; 8.12 SetDifference * Computes the difference of two sets. Inputs: ES:DI- pointer to first byte of destination set. DX:SI- pointer to first byte of source set. Outputs: None. Include: stdlib.a SetDifference computes the result of (ES:DI) := (ES:DI) - (DX:SI). The destination set is left with its original items minus those items which are also in the source set. Example: les di, SetPtr add di, 7 ;Point at 8th set in group. mov dx, seg SetPtr2 ;Point at first set in group. lea si, SetPtr2 setdifference ; 8.13 NextItem * Locates the next (first) available item in a set. * Searches for items in ascending order using the ASCII collating sequence. Inputs: ES:DI- pointer to first byte of set. Outputs: AL- Contains first item found in set (zero if the set is empty). Include: stdlib.a NextItem searches for the next available item in a set. It returns the ASCII code of the character it finds in the AL register. If the set is empty, NextItem returns zero (since chr(0) is illegal). This call does not affect the set in any way. In particular, after the call the character located will still be present in the set. Example: les di, SetPtr add di, 7 ;Point at 8th set in group. nextitem mov ch2, al ; 8.14 RmvItem * Locates the next (first) available item in a set and then removes that item from the set. * Searches for items in ascending order using the ASCII collating sequence. Inputs: ES:DI- pointer to first byte of set. Outputs: AL- Contains first item found in set (zero if the set is empty). Include: stdlib.a RmvItem searches for the next available item in a set. It returns the ASCII code of the character it finds in the AL register and removes that item from the set. If the set is empty, NextItem returns zero (since chr(0) is illegal). Example: les di, SetPtr add di, 7 ;Point at 8th set in group. rmvitem mov ch3, al ;