A Technique for Automatically Porting Dialects of Pascal to Each Other QCAD Systems, Inc. 1164 Hyde Avenue San Jose CA 95125 Michael J. Sorens In the course of programming events, there have been those who have found it necessary to write Pascal code that could be ported between dialects of Pascal which differ from machine to machine. This task cannot be accomplished just by writing "vanilla" code to a maximal degree, as there are inevitably syntactical and semantic differences from one dialect to another. We have developed a technique which, in conjunction with vanilla code, allows translation from one dialect to another to be performed automatically. Since it is generally the case that when a file of program text needs to be translated to a new dialect it also needs to be transmitted to a physically separate computer, our translation utility exists in two forms: a stand-alone translation utility, and a combination translation/transmission utility. FINDING THE DENOMINATOR The primary task is to reduce functions and procedures to common denominators. That is, if a function does not exist in one dialect, it must be written for that dialect using the dialect's own primitives. This could be a simple renaming. For example, consider the two functions shown below: Turbo Pascal: copy(string, location, length) VAX Pascal: substr(string, location, length) As it happens, these functions are semantically identical, only the names have been changed to confuse the unwary. If we have written code in Turbo Pascal, then we simply create a copy function for VAX Pascal: function copy(STR: string255; WHERE, LEN: integer): string255; begin copy := substr(str, where, len); end; Thereafter, we can use the copy function in either Turbo Pascal or VAX Pascal. Of course, we only want the above definition of copy to exist in the VAX environment, as it will generate compilation errors in Turbo; we will see how to selectively compile this kind of entity shortly. A second case might be this: in evaluating a logical expression, HP Pascal, for example, has a compiler option to do only partial evaluation (a la C) where an expression is evaluated only until the point where its result is determinable. C programmers use this in establishing and avoiding side effects. (It is quite a useful thing to have, but, alas, having it in only one Pascal dialect makes life difficult.) For example, if the function foo changes some global variable y, then (false and foo(x)) will not change y if the dialect uses partial evaluation. This is due to the way the boolean and operates. Both terms must be true for the conjunction to be true. Scanning from left to right, since the first term false is false, we do not need to even look at foo(x) to determine the result. Without the partial evaluation feature, foo(x) will always be evaluated, potentially yielding differing results. A second example might be a test such as while (count <= length(str)) and (str[count] = 'X') do... which is a valid statement with partial evaluation but can cause a runtime error without it. Why? Well, as long as count is less than or equal to the length of the string str there is no problem. But when count becomes one larger than the length of str referencing str[count] may cause a runtime error, depending on whether range checking is enabled for a particular compiler. Therefore, we must modify the code in which results might vary depending on whether partial evaluation is available. This could be done by introducing a two-step evaluation. That is, rather than if (false and foo(x)) then... we substitute if false then if foo(x) then... which guarantees an equivalent partial evaluation. The while loop requires introducing some ugliness. We need a boolean variable, call it done, which we initialize to false. Then, the code might be while (count <= length(str)) and (not done) do begin done := (str[count] = 'X'); if not done then ... The simple elegance and power of partial evaluation begins to shine through, no? A more cumbersome reduction involves string comparisons. In HP Pascal or Turbo Pascal one could compare strings s1 and s2 by merely interposing a relational operator, e.g., (s1 < s2) or (s1 = s2). In VAX Pascal, however, only strings of the same length can be compared (not strings of the same maximal length, but strings of the same length at the moment of comparison). Thus, we need to introduce an added functional level in all three Pascals so that the code can be identical. We create the function StrCmp: function StrCmp(S, T: string80): integer; { return -1 if st } var Result: integer; begin if length(s) < length(t) then begin result := StrCmp(s, copy(t, 1, length(s))); if result = 0 then StrCmp := -1 else StrCmp := result end else if length(s) > length(t) then begin result := StrCmp(copy(s, 1, length(t)), t); if result = 0 then StrCmp := 1 else StrCmp := result end else if s = t then StrCmp := 0 else if s < t then StrCmp := -1 else if s > t then StrCmp := 1 end; Then, if we have existing code which needs to be retrofitted, we must change occurrences of (s1 = s2) to (strcmp(s1, s2) = 0), and likewise for the other relational operators. MAKE THAT CODE DISAPPEAR At some point, there will be pieces of code that must be seen by one compiler but must be hidden from another. Enter DIALATE. DIALATE -- a combination of the words dialect and translate -- converts one dialect of Pascal to another provided a program text file has been set up according to the guidelines about to be discussed. We introduce a meta-notation to be used in any Pascal we are interested in. This is just a notation that is in some sense "above" the Pascal code in that our translator will be looking only at the meta-notation and not the Pascal text itself. In our meta-notation we have meta-brackets which are the only constructs that DIALATE looks for: {@x} opening meta-bracket for dialect x {@} closing meta-bracket We use the curly braces "{" and "}" as the basis of our dialect notation, since they already have the capability of hiding text from a Pascal compiler -- the simple comment. DIALATE scans for comments that immediately begin with an "@" symbol, indicating that the comment is a special, dialectic comment. Immediately following the "@" can be one or more dialect designations. We currently use the following conventions: T - Turbo Pascal (IBM PC) V - VAX Pascal (VMS) H - HP Pascal A - Apple Pascal (Macintosh) Any piece of code that cannot run on all relevant machines must be surrounded by meta-brackets. DIALATE inserts and removes the right curly brace of the opening meta-bracket in a judicious manner so that the compiler "sees" only valid language constructs. Let's look at an example. Consider opening a file for input in Turbo Pascal and HP Pascal: Turbo Pascal: assign(MyFile, FileName); reset(MyFile); HP Pascal: reset(MyFile, FileName); Since there are significant differences, it is perhaps wise to create a procedure OpenInputFile which can be used in both Pascal dialects. Here is what the procedure will look like in Turbo Pascal: (1) procedure OpenInputFile(var F: text; NAME: string80); (2) begin (3) {@T} (4) assign(f, name); (5) reset(f); (6) {@} (7) {@H (8) reset(f, name); (9) {@} (10) end; Carefully examine the position of each of the curly braces. Notice in line 3 that there is a "}" but that in line 7 there is not. Thus, lines 4 and 5 are active code while line 8 is passive code. Lines 6 and 9 are closing meta-brackets which never change. The "{" in line 6 begins a comment which hides the "@" character, while the "{" in line 9 is ignored since it is already inside a comment. Now let's run the procedure through DIALATE, converting it to HP Pascal code: (1) procedure OpenInputFile(var F: text; NAME: string80); (2) begin (3) {@T (4) assign(f, name); (5) reset(f); (6) {@} (7) {@H} (8) reset(f, name); (9) {@} (10) end; The only difference is that the "}" in line 3 is gone, and there is a new "}" in line 7. This has reversed the active and passive sections of code. Hence, to write a piece of automatically translatable code, decide on which dialect you wish to write in, and use the appropriate meta-brackets. The technique is extensible to multiple machines with a concise notation. Take, for example a function to find the location of a pattern string within some other string. In Turbo Pascal and HP Pascal this function is called pos, while in VAX Pascal it is called index. We could then write a compatibility function called index to be used in Turbo or HP Pascal, or one called pos to be used in VAX Pascal. A third alternative, though, is to write a function with a new name, perhaps LocateSubString, to be used in all three languages. Stylistically, it might be better to use a very different name so that there is no chance of confusing the name with some other valid language construct. As it happens, the functions pos and index do precisely the same thing, though their parameter order is different, so we do not have to write much code. (1) function LocateSubString(Object, Target: string80): integer; (2) begin (3) {@HT (4) LocateSubString := pos(object, target); (5) {@} (6) {@V} (7) LocateSubString := index(target, object); (8) {@} (9) end; We can see that the above function is written in VAX Pascal since line 7, the VAX code, is active. Line 4 is passive code which is for both the HP and the Turbo dialects, since line 3 has both a "T" and an "H". When translated to either of these dialects, line 4 will become active while line 7 becomes passive. CASE IN POINT There is one final twist which has precipitated out of the Babel-like differences in Pascals, and that is the else clause of a case statement. Some Pascals, such as Turbo, use the keyword else to indicate any cases not explicitly enumerated. Other Pascals, such as VAX, HP, and Macintosh, use the keyword otherwise. We can certainly handle this discrepancy using our standard meta-notation described in the previous section. A typical case statement might look like this: case SelectionChar of 'R': RunIt; 'P': ProcessIt; 'E': EditIt; {@T} else {@} {@HVA otherwise {@} writeln('Invalid selection character); end { case } ; But this can quickly become an annoyance. Since there is no way to make some kind of generic function which handles all case statements (like we did with OpenInputFile), we must use the meta-brackets every time we have a case statement. Or rather, we would have to, if DIALATE didn't have a better solution. What we would like to do is have the translator automatically convert an else to an otherwise if we are going from Turbo to either HP, VAX, or Macintosh Pascal, and convert an otherwise to an else if we are going in the converse direction. But wait, what about the oft-found if... then... else... statement? How will we know if an else belongs to an if or to a case? What we have chosen to do is have a special ELSE-OTHERWISE convention. In Turbo Pascal, we write "ELSE" to denote an else of a case statement, and "else" to denote an else of an if statement. In our other Pascals, we write "OTHERWISE" to denote an otherwise of a case statement, and "else" to denote an else of an if statement. That is, the case clause -- whatever it is called -- must be in uppercase letters, while the if clause must be in lowercase letters. DISCUSSION There are minor disadvantages to this dialect translation technique. The foremost is that you must have all meta-brackets balanced and correct, otherwise you might hide too little or too much code from the compiler. This might cause compilation errors, but it might not, creating possibly subtle bugs. For example, if we accidentally made the VAX code above into passive code, then the LocateSubString function would never be assigned a value. This could cause random or unpredictable results in the program. On the other hand, if we made both assignments active, a compiler should complain over the unknown function index or pos, depending on which machine the code is compiled. It takes a little getting used to, but typing correct meta-brackets is, after all, no more than typing correct syntax in a programming language. Second, it is not possible to put actual comments inside a region that is meta-bracketed. This may cause compilation errors when the entire region is supposed to be hidden, since the closing comment bracket could inadvertantly reactivate a portion of the hidden code. This is most insidious, however, when it does not cause compilation errors, as, for example: procedure DUMMY; begin {@H a := 10; b := 5; {@} {@T} a := 5; { some global variables } b := 10; {@} end; The above code, as written, will work fine with the Turbo compiler. However, when we translate it to run with HP Pascal, we will get the wrong value for b, since the closing "}" of the "real" comment will prematurely close the comment created by the meta-brackets. The dialect translation technique discussed in this paper is an effective and rapid way to work in several different Pascals with virtually the same program text. It may seem awkward at first, but one can readily get used to the style. And for those who need to work with more than one dialect or more than one machine, the tool may prove invaluable. DIALATE was developed due to a perceived need here at QCAD. We manufacture a large software package (a parser generator) which runs on the machines discussed above; DIALATE allows us to keep the same code on all of the machines. Should there be any readers who are so amazed by the technique revealed in this paper (for the first time anywhere) I will gladly supply both object and source code for DIALATE for a mere $10.24 (a kilo-penny) to cover diskette, postage, copying, etc. I will also be happy to tell you about some of the neat software tools that QCAD sells for real money.