THE ABC's OF UDF's By Greg Lief I. INTRODUCTION AND BACKGROUND A. Clipper vs. dBASE Do you remember the first time you saw or heard about Clipper? Chances are that your first impression of it was that of a dBASE compiler. That feature alone makes Clipper worthwhile. However, as you take the time to delve into Clipper, you quickly realize that there is more to Clipper than just a "dBASE compiler". The Clipper language addresses nearly all of the shortcomings of the dBASE language by providing numerous additional functions and commands. But more important than all those new goodies is Clipper's open architecture, which allows you to write and/or link in routines written in "C", Assembler, and (surprise!) Clipper. Such Clipper routines are known as User-Defined Functions. User-Defined Functions (UDFs) open up an entirely new avenue of options that are simply inconceivable in a dBASE environment. For those of you that have not taken the time to write your own UDFs already, think of them as your own personal language extensions. Plus, you do not have to learn another (often cryptic) language to write them. B. Benefits of UDFs A few basic benefits of UDFs include: - getting the absolute most out of the wonderfully open-ended Clipper language; - minimizing repetitious coding in your applications, thus reducing overhead requirements, maintenance time/cost/ frustration, and increasing efficiency; - allowing you to develop your own libraries of UDFs rather than having to reinvent the wheel with each new occurrence of the same old problem, thereby increasing productivity; - achieving consistency among your applications by using the same functions "across the board", which decreases confusion (both for your clients and you); A side benefit to UDFs is that they gently but firmly force you to adopt better structured programming techniques. For a language (dBASE, and hence Clipper) that can be very unstructured, we need all the guidance we can get! UDFs encourage modular programming, rather than throwing everything into one procedure and hoping you will never have to debug it. C. A Few Examples Some of the myriad purposes for User-Defined Functions include: 1. Data validation Without UDFs, you are limited to useful (but simple) VALID clauses, or RANGE checking. With UDFs, you can perform complex look-ups into related databases or arrays, and display verbose descriptions found therein. You can also provide the user meaningful error messages in the event that their data does not pass muster, rather than having them sit there wondering why "the cursor won't move". 2. Common Occurrences, such as yes/no prompts, error messages, or displaying file directories for file-related operations Without UDFs, you are faced with redundant code, which leads to difficult maintenance. With UDFs, you have the code in just one place. This is a blessing when you decide that those green on magenta yes/no prompts are not as pleasant as you originally thought. Now you can change it once instead of ten (or more) times. 3. Centering character strings on the screen or printer Without UDFs, you have to type in the cumbersome "INT((80-LEN(string))/2)" formula throughout your program. Don't forget the last parenthesis, or the compiler will scream at you! With UDFs, you waste no time retyping said formula, which results in cleaner code and a clearer head. 4. Cosmetic enhancements Without UDFs, you have the same old humdrum database management look and feel (or should we call it "yawn and stretch"?), which some people like and most people are resigned to. With UDFs, you can have exploding / pull-down / pop-up boxes, falling character strings, splitting screens, and all kinds of other crazy things that will get your users' attention IMMEDIATELY. 5. "Hot-key" procedures, including pop-up help screens and many other utilities Without UDFs, such utilities are impossible and unthinkable. With UDFs, the sky (and your imagination) is the limit! II. UDF BASICS A. Structure Consider the following skeleton of a (recently deceased) UDF: FUNCTION funcname PARAMETERS param1, param2, ... PRIVATE return_value, etc... | | code to manipulate data | RETURN (return_value) i. Name that Function, and Grab Those Parameters! First, we name the UDF using the FUNCTION (or PROCEDURE) statement. Next, the PARAMETERS statement is used to receive any parameters that will be sent to this UDF from the calling program. Note that some UDFs will not require the use of parameters, in which case the PARAMETERS statement may be cheerfully omitted. ii. PRIVATE Variables The next step is to declare PRIVATE all variables that will be "local" to this UDF. Why bother with this? There are two reasons. First, all variables available, or "visible", to the calling program will also be available to the UDF. Therefore, any changes or assignments made to variables within the UDF will affect such variables in the context of main program as well UNLESS YOU DECLARE THEM PRIVATE TO THE UDF. This point is quite important, and is the source of many subtle bugs if not heeded. Suppose that you have defined a variable named OLDSCRN in your main program that holds a crucial saved screen. Now further suppose that you wish your UDF to display a message or a scrolling window, thereby affecting the screen. You would probably want to save that portion of the screen to a buffer so that the UDF could then restore it properly upon exit. Finally, suppose that you save that window to the variable OLDSCRN within the UDF without declaring it as PRIVATE to the UDF. No local copy of the variable will be made, which means that you will overwrite the previous value of OLDSCRN, which will almost certainly wreak havoc once you return to the calling program. Another secondary reason for declaring variables PRIVATE within a UDF is so that you can keep track of what variables you are using in your application and where they are being used. iii. The Meat of The UDF This consists of the code that will manipulate the data, or cure cancer, or whatever it is that this UDF is intended to do. iv. A Speedy RETURN We wrap up the UDF with a RETURN statement. With functions, this statement MUST return a value back to the calling program (even if this value will be ignored). It is strongly advised that you follow the practice of having only one RETURN statement in your functions, rather than something like the following: FUNCTION myudf PARAMETER mNAME, mID IF mNAME = 'GRUMPFISH' RETURN (.T.) ELSEIF mNAME = 'HAPPYFISH" RETURN (.F.) ELSEIF mID = '99999' RETURN (.T.) ENDIF As you can see from this substandard piece of code, there are no less than three exit points from this UDF. Although this is a fairly simple example, multiple exit points can can make debugging very difficult when you begin working with UDFs of greater complexity. The following is a drastic improvement: FUNCTION myudf PARAMETER mNAME, mID PRIVATE ret_val ret_val = .F. && guilty until proven innocent IF mNAME = 'GRUMPFISH' .OR. mID = '99999' ret_val = .T. ENDIF RETURN (ret_val) B. FUNCTION or PROCEDURE? The term "User-Defined Function" refers not only to functions, but to procedures as well. The fundamental difference between these two beasts is that functions return a value, whereas procedures do not. However, Clipper allows you to begin a program statement with a function [such as the ever-popular INKEY(0)], in which case the return value is ignored. You may also have functions that always return the same value, with the express intent that you will ignore that value. Other differences exist as well, particularly relating to parameters, which we will explore momentarily... III. PARAMETERS A. Introduction There are some functions that always return one value, such as TIME() and DATE(). These functions have a predefined mission that cannot be altered by sending them additional information. However, when you write your own UDFs, you will most likely want to exert more control over them. For example, if you design a UDF that pops up an error box, you may initially write it to use a generic message such as "Error - press any key to continue". Chances are that you (and your users) will quickly tire of this uninformative error message. In this instance, whenever you call the UDF you would want to pass it a message to be displayed. You could then easily construct the UDF to act upon the passed message to draw a box wide enough to accommodate it. This message is known as a PARAMETER. B. Formal vs. Actual Parameters are memory variables that receive values or references passed to a function or procedure. Parameters are known as either "formal" or "actual". Formal parameters are the receiving memory variables specified as the arguments of the PARAMETERS statement. Actual parameters are the arguments of the call to the procedure (DO..WITH) or UDF [MyUdf(...)]. Note that the number of formal and actual parameters do not have to match. Let us use the following example to explore the various ramifications of having more or less actual parameters than formal parameters. Here is the UDF we will use: FUNCTION MyFunc PARAMETERS mvar1, mvar2, mvar3, mvar4 PRIVATE ret_val ret_val = ((mvar1 * 2.75) + (mvar2 * mvar3)) / (mvar4 + 5) RETURN (ret_val) As you can see, MyFunc() is an obtuse number-cruncher. It accepts four parameters, performs a strange calculation involving them, and returns the value of said calculation. What happens when we call MyFunc() with: 1) "result = MyFunc(10, 5, 7, 20)" The number of actual and formal parameters match exactly. MVAR1 assumes the value 10, MVAR2 assumes the value 5, MVAR3 assumes the value 7, and MVAR4 assumes the value of 20. For those of you keeping score, the return value is 2.5, which will be stored in RESULT. 2) "result = MyFunc(10, 5, 7, 20, 76, 120)" Here we are passing six actual parameters. Since there are only four parameters in the formal list, the function will only act upon the first four actual parameters. As in example 1), MVAR1 assumes the value 10, MVAR2 assumes the value 5, MVAR3 assumes the value 7, and MVAR4 assumes the value of 20. The parameters 76 and 120 are discarded. 3) "result = MyFunc(10, 5, 7)" In this example we pass only three actual parameters. With MyFunc(), this is asking for trouble because the UDF explicitly acts upon all four parameters. You can see that, as above, MVAR1 assumes the value 10, MVAR2 assumes the value 5, and MVAR3 assumes the value 7. However, MVAR4 will be undefined since we have not passed a fourth parameter to match it. This will cause MyFunc() to crash when it reaches the calculation statement with the message "unidentified identifier MVAR4". C. Number and Type Checking That little crash in Example 3 could have been avoided using the Clipper function PCOUNT(), which checks the number of actual parameters passed. To make our little number-cruncher a bit more bulletproof, we could rewrite it as follows: FUNCTION MyFunc PARAMETERS mvar1, mvar2, mvar3, mvar4 PRIVATE ret_val ret_val = 0 ** did they pass all four parameters? IF PCOUNT() > 3 ret_val = ((mvar1 * 2.75) + (mvar2 * mvar3)) / (mvar4 + 5) ELSE ** guess not - time for sound and fury ? 'Error in call to MyFunc()' TONE(220,1) TONE(220,1) INKEY(0) ENDIF RETURN (ret_val) However, suppose that we are coding late one night, and we are very tired. So tired, in fact, that we make a careless mistake like the following: STORE 10 TO val1, val2, val3, val4 | val3 = 'Could be trouble' | result = MyFunc(val1, val2, val3, val4) When we finally get to MyFunc(), the variable MVAR3 will assume the value of VAL3, which has inadvertently been defined as a character string. This will cause a "Type Mismatch" explosion when MyFunc() attempts to perform a numeric operation on a string. However, we can avoid this as well by using the Clipper function TYPE(). Obviously enough, TYPE() checks the type of a variable that is passed to it in the form of a character string. For example, in this instance "TYPE('val3')" would return a value of "C" for character, because VAL3 is a character-type variable. With the TYPE() function under our belts, let us take one more stab at MyFunc(): FUNCTION MyFunc PARAMETERS mvar1, mvar2, mvar3, mvar4 PRIVATE ret_val, err_msg ret_val = 0 ** did they pass all four parameters? IF PCOUNT() > 3 ** are all parameters numeric type? IF TYPE('mvar1') = 'N' .AND. TYPE('mvar2') = 'N' .AND. ; TYPE('mvar3') = 'N' .AND. TYPE('mvar4') = 'N' ret_val = ((mvar1*2.75) + (mvar2*mvar3)) / (mvar4+5) ELSE err_msg = 'Type mismatch in parameters to MyFunc()' ENDIF ELSE err_msg = 'Not enough parameters passed to MyFunc()' ENDIF ** see if error message was defined - if so, display it IF TYPE('err_msg) = 'C' ? err_msg TONE(220,1) TONE(220,1) INKEY(0) ENDIF RETURN (ret_val) As you can see, the MyFunc() code continues to grow, but it is now virtually bullet-proof. In this incarnation, you can call from it anywhere in your program without having to worry about the program crashing because of incorrect parameters. This is good programming practice for you to follow when creating your own UDFs for several reasons: 1) Some other programmer may eventually use your UDF, especially if you are working in a team environment, and you should make every attempt to shield them from fatal (i.e., immediate exit to DOS) errors. 2) Nobody is perfect, and it is entirely possible that at some future time you too could accidentally call a UDF with the wrong parameters, so why not shield yourself as well? Also note the use of meaningful error messages based on the type of error. This is not exactly earth-shattering but may save someone valuable time when debugging. D. Reference vs. Value When a parameter is passed by VALUE, the UDF evaluates it and makes a "local" copy of the resultant value at a different memory address. Whenever the UDF needs to work with that parameter, it refers to the new memory address rather than using the memory address of the original parameter (which is unknown to it). This is ideal when you want the UDF to manipulate or tear asunder some variable without affecting its value once control returns to the calling program. "And in this corner..." passing parameters by REFERENCE means that instead of passing the value of a variable, you instead are passing a pointer to a memory address that actually contains the variable. Unlike parameters passed by value, no "local" copy is made; thus if you change that parameter within the UDF, you are effectively changing the actual value of that variable. Memory variables are passed by reference to PROCEDURES, and by value to FUNCTIONS. Consider the following example: * Main.prg mvar = 50 DO MultByFive WITH mvar ? mvar RETURN * PROCEDURE MultByFive PARAMETERS testing testing = testing * 5 RETURN Since variables are passed by reference to PROCEDURES, MultByFive is actually changing the value of the variable MVAR. When we return to Main.prg, the value of MVAR will be 250, rather than 50. However, the equivalent FUNCTION would allow you to manipulate this variable while protecting it in the calling program: * Main.prg mvar = 50 ? MultByFive(mvar) ? mvar RETURN * FUNCTION MultByFive PARAMETERS testing testing = testing * 5 RETURN (testing) Because parameters are passed by value to FUNCTIONS, MultByFive evaluates MVAR and places that value in a different memory address, which it then refers to as TESTING. When it multiplies TESTING by 5, it is only changing the "local" copy of the variable rather than manipulating MVAR. Thus, when control returns to Main.prg, the value of MVAR will still be 50. There are instances where you may wish to override these basic defaults. For example, one good reason to pass variables by reference to FUNCTIONS is speed. It takes time for the FUNCTION to create the local copy of the variable. Passing by reference, however, eliminates this need, and can increase throughput by an enormous factor. To pass a variable by reference to a function, simply precede it with the "AT" sign (@). For instance, in the last example we could have used the statement "? MultByFive(@mvar)". Bear in mind that this would have caused MultByFive() to change the value of MVAR to 250. In similar fashion, you can pass variables to PROCEDURES by value. This is useful if you do not want the procedure to inadvertently change the value of a variable in the calling program. Had we chosen this route in our PROCEDURE-al example above, the syntax would have been "DO MultByFive WITH (mvar)". Of course, we would probably have modified MultByFive to display the calculated value, or else it would be an exercise in futility. Database fields are always passed by value to FUNCTIONS and PROCEDURES, and must be bounded by parentheses. Array names are always passed by reference. This is sensible, because it would be incredibly time-consuming to pass a large array to a UDF, only to have the UDF then make a copy of it for local manipulation. Array elements and expressions are always passed by value. IV. HOUSEKEEPING If you are designing a UDF that in any way changes the environment (color, screen, coordinates, work area, cursor status), it is good practice to save all items that will be changed upon entry of the UDF, and restore them upon exit. The following code fragment illustrates this: FUNCTION look_up PRIVATE oldscrn, oldcolor, work_area, oldrow, oldcol, oldcurs ** save environment SAVE SCREEN TO oldscrn oldcolor = SETCOLOR() work_area = SELECT() oldrow = ROW() oldcol = COL() oldcurs = IsCursor() | | code to manipulate data | ** restore environment RESTORE SCREEN FROM oldscrn SETCOLOR(oldcolor) SELECT(work_area) SET CURSOR (oldcurs) @ oldrow, oldcol SAY '' RETURN(return_value) This particular UDF is going to be used to look up values in a related database, and will change the entire environment (screen, color, work area, etc.). Therefore, we must save the current settings so that we can restore them properly upon exit. First, we declare our local variables as PRIVATE to avoid potential conflicts with other variables of the same names. Next, we use the Clipper command SAVE SCREEN to save the screen, because we intend to change it. SETCOLOR() returns the current color setting, which we must save because we might change the color. SELECT() returns the current work area. ROW() and COL() return the current screen row and column coordinates, respectively. IsCursor() is a public domain routine written by John Scott Prinke, which returns the current state of the cursor as a logical value (.T. means on). V. HOT-KEY PROCEDURES A. Activating the Hot Key A "hot-key" procedure is one that is activated by a designated keypress. To create and use a hot-key procedure, you must first define the hot-key in your main program in this manner: * Main.prg EXTERNAL && not always necessary SET KEY TO && assign hot key to the UDF A complete listing of scan codes for all of the function key, Ctrl-key, and Alt-key combinations can be found in your Clipper manual. B. External The EXTERNAL command must be used if you intend to link in a routine from a library rather than from an object module. The reason for EXTERNAL stems from the modus operandi of the Clipper compiler. When you call a procedure with "DO procname", the compiler creates a symbol named PROCNAME. If this symbol remains unresolved before link-time, (i.e., the compiler does not find it in any of the compiled .prg files), the linker will first attempt to resolve it by searching the other object files that you are linking in. If it is unsuccessful, it will then search through the specified libraries. However, the compiler does not create a symbol when you use the SET KEY command. For example, you wish to link in the Grumpfish Library appointment tracker and set the F10 key to activate it. You forget about EXTERNAL, and only use: SET KEY -9 TO popdate This will look fine through compilation and linking, but as soon as you get into the program and begin pressing F10, you will get the nasty run-time error "MISSING EXTERNAL POPDATE". As Spock would readily point out, this is perfectly logical, because you forgot to instruct the linker to link in this module. C. Basic Structure The structure of a hot-key procedure is similar to that of a UDF, with several key differences (pun intentional): PROCEDURE PARAMETERS proc, line, var && built-in Clipper parameters! SET KEY TO && to prevent recursion | | code to manipulate data | SET KEY TO && reset for later use RETURN 1. Built-in Clipper PARAMETERS The PARAMETERS line is advisable, because whenever you execute a SET KEY procedure, Clipper automatically passes three parameters to the procedure: - Procedure Name (always upper-case), Source Code - Line Number (0 if the source code was compiled with the "line numbers off" switch) - Variable Name being READ (always upper-case) You do not necessarily have to trap these parameters, but in many cases they may be useful to know. One instance where you would definitely want this information is if you are building a context-specific help procedure, such as the Grumpfish Help System. You may wish to note, however, that since line numbers are quite volatile, you should not rely too heavily upon them in a help system. As a matter of fact, the Grumpfish Help System completely ignores line numbers, acting instead only upon the procedure and variable names. 2. Recursion and How to Avoid It In the line after the PARAMETERS line, we use SET KEY <.> TO. This prevents recursion - otherwise, the user could continue to press the hot-key from within the hot-key procedure, which would then call itself repeatedly until both you and it would be thoroughly confused. The last line before the RETURN statement resets the hot key. If you are using a number of hot keys, you may wish to consider creating UDFs that will turn all of the hot keys on and off. You could then call those UDFs to prevent the user from "hot-keying" willy-nilly through the program: PROCEDURE PARAMETERS proc, line, var && built-in Clipper parameters! HotKeysOff() | | code to manipulate data | HotKeysOn() RETURN D. Housekeeping Nearly all hot-key procedures will change the environment in some manner, which makes the aforementioned good housekeeping techniques even more crucial. Get in the habit of saving volatile items like the current color and affected portions of the screen right at the top of your hot-key procedures. VI. MAKING YOUR OWN UDF LIBRARY So you have built up an impressive collection of fully debugged UDFs. Great! But aren't you getting a bit tired of typing all those object (.OBJ) modules on the link line, and aren't your directories getting a bit cluttered? The best way to kill both of these birds with one stone is to consolidate all of your UDFs into a library (.LIB) file with the Microsoft Library Manager (TM). A. Using the Library Manager The Microsoft Library Manager is pre-packaged with a number of other Microsoft products, including their C compiler and some versions of DOS. (Look for the file LIB.EXE in your DOS directory or on your DOS supplemental disk.) Using LIB to create and modify your own function libraries is simple. The syntax for LIB is: LIB , , is the name of the library (the .lib extension is not necessary). are of the general format filename, and must be separated by spaces. The available symbols include: + add modulename to the library - remove modulename from the library * extract modulename without removing from library -+ replace modulename in library -* extract modulename and remove from library is the name of the list file to be generated, which lists the module names in the library and the memory usage for each module. is the name of the output library file. This is useful if you wish to make a new library that essentially duplicates another with some modifications. If you do not wish to generate a list file or a new output library file, you may follow the command list with a semi- colon. Note that you may also run LIB without command line parameters, in which instance you will be prompted for each item. B. Preparing your source code The most efficient way to build a library is to compile each of your UDFs separately, excepting instances where you know that certain UDFs always must be linked in together. The reason for isolating each UDF is because when you call the library with your linker, the linker will pull in only what you have asked for in your source code. Let us suppose that you have called the pop-up calculator from Grumpfish Library with the source code line "DO POPCALC". When you compile this source code, the compiler creates the symbol POPCALC, which the linker will then attempt to resolve by searching the libraries for an object module of that name. However, because I compiled each of my UDFs separately before creating Grumpfish Library, the linker will only link in POPCALC, rather than pulling in the entire library. By contrast, imagine that you have ten short UDFs in one .prg file. You compile this file and put it into a library. If you call any one of those ten UDFs, the linker will be forced to link in ALL of them because they are all part of the same object module. This is additional overhead that you simply do not need. Therefore, you should take the time to compile each UDF separately. C. Examples Let us suppose that you have four object files: LOOKUP.obj, REC_LOCK.obj, REC_SRCH.obj, and CENTER.obj. You wish to combine these into one .lib file named MYFUNCS.lib. You do not need a list file, nor do you need a different name for the output library. Here is the command you would use: LIB myfuncs +lookup +rec_lock +rec_srch +center ; This will create MYFUNCS.LIB, which will contain the four object files. Then instead of linking in each .obj file like so: PLINK86 FI myprog, dup_chk, rec_lock, rec_srch, center ; LI \clipper\extend,\clipper\clipper you merely link in the library: PLINK86 FI mfile LI myfuncs,\clipper\extend,\clipper\clipper Suppose that later you make changes to the source code of REC_LOCK.prg and need to recompile it. You would then update MYFUNCS.LIB with the following command: LIB myfuncs -+rec_lock; This removes the module REC_LOCK from the library, then adds in the newer version. Naturally, you must have the file REC_LOCK.OBJ in the same directory, or else you must add the path specifier so that LIB can find it. If you decided later that you wanted to extract the object code for REC_LOCK, you would use this command: LIB myfuncs *rec_lock; This would extract the file REC_LOCK.OBJ without removing it from the MYFUNCS library.