ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ INSIDE TURBO PASCAL UNIT FILES Version 6.0 for MS-DOS Version 1.0 for WINDOWS ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ by William L. Peavy ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 ABSTRACT If you want to know what is in a .TPU (unit) file produced by either Version 1.0 of Turbo Pascal for Windows or by Version 6.0 of Turbo Pascal from Borland International, then this paper is for you. It doesn't explain quite everything since the I don't have access to secret documents or anything like that and since some of the data in .TPU files just doesn't have enough auxiliary information to make its role clear. However, it is possible to learn a great deal about how Turbo Pascal organizes the information it needs to refer to, and it is also possible to learn just what kind of code the compiler produces. This is the fourth in a series of reports on the subject of Turbo Pascal Units, the previous reports treating with Turbo Pascal Versions 5.0 through 6.0. The evolution of these files in the face of changing requirements has been fascinating to behold and deciphering their contents has been challenging to say the least. The programs supplied with this report have been reorganized from their 6.0 style and many identifiers have been changed. There are also a few bug fixes and algorithm changes. Other changes were dictated by the changes in the utilization of the TPU file itself by the Windows Compiler. Since I have a "real" job which requires my full attention, and since it doesn't involve use of these products in any direct way, I am usually hard-pressed to find the personal time to conduct this research. Consequently, I always refuse to commit to follow-up or even error correction. It would be irresponsible of me to pretend it could be otherwise. Even so, this is a revised report which contains a few error fixes and discusses the newly enhanced program which incorporates these fixes and sports some enhanced capabilities. Contents 1. Introduction 5 1.1 Caveats 5 1.2 Evolution 6 1.3 Treatment 6 2. Gross File Structure 7 2.1 User Units 8 2.2 SYSTEM Unit 8 3. Locators 8 3.1 Local Links 9 3.2 Global Links 9 3.3 Table Offsets 9 3.4 Basic Relationships 10 4. Unit Header 13 4.1 Description 13 4.2 UNIT Size 16 5. Symbol Dictionaries 16 5.1 Organization 16 5.2 Interface Dictionary 17 5.3 Debug Dictionary 17 5.4 Dictionary Elements 17 5.4.1 Hash Tables 18 5.4.1.1 Size 18 5.4.1.2 Scope 19 5.4.1.3 Special Cases 19 5.4.2 NAME ENTRIES 20 5.4.3 NAME Stubs 20 5.4.3.1 Label Declaratives ("O") 20 5.4.3.2 Un-Typed Constants ("P") 21 5.4.3.3 Named Types ("Q") 21 5.4.3.4 Variables, Fields, Typed Cons ("R") 22 5.4.3.5 Subprograms & Methods ("S") 24 5.4.3.6 Turbo Std Procedures ("T") 25 5.4.3.7 Turbo Std Functions ("U") 25 5.4.3.8 Turbo Std "NEW" Routine ("V") 25 5.4.3.9 Turbo Std Port Arrays ("W") 26 5.4.3.10 Turbo Std External Variables ("X") 26 5.4.3.11 Units ("Y") 26 5.4.4 Type Descriptors 27 5.4.4.1 Scope 27 5.4.4.2 Prefix Part 28 5.4.4.3 Suffix Parts 29 5.4.4.3.1 Un-Typed 29 5.4.4.3.2 Structured Types 29 5.4.4.3.2.1 ARRAY Types 30 5.4.4.3.2.2 RECORD Types 30 5.4.4.3.2.3 OBJECT Types 31 5.4.4.3.2.4 FILE (non-TEXT) Types 31 5.4.4.3.2.5 TEXT File Types 32 5.4.4.3.2.6 SET Types 32 5.4.4.3.2.7 POINTER Types 32 5.4.4.3.2.8 STRING Types 32 5.4.4.3.3 Floating-Point Types 32 5.4.4.3.4 Ordinal Types 32 5.4.4.3.4.1 "Integers" 33 - iii - Contents 5.4.4.3.4.2 BOOLEANs 33 5.4.4.3.4.3 CHARs 33 5.4.4.3.4.4 ENUMERATions 34 5.4.4.3.5 SUBPROGRAM Types 34 6. Maps and Lists 35 6.1 PROC Map 35 6.2 CSeg Map 36 6.3 Typed CONST DSeg Map 37 6.4 Global VAR DSeg Map 37 6.5 DLL LIST 38 6.6 Donor Unit List 38 6.7 Source File List 39 6.8 DEBUG Trace Table 40 7. Code, Data, Fix-Up Info 40 7.1 Object CSegs 41 7.2 CONST DSegs 41 7.3 Fix-Up Data Tables 42 8. Supplied Program 45 8.1 TWU1 45 8.1.1 Unit TWU1EQU 46 8.1.2 Unit TWU1RPT 46 8.1.3 Unit TWU1UAM 46 8.1.4 Unit TWU1UNA 47 8.2 Notes on Program Logic 48 8.2.1 Formatting the Dictionary 48 8.2.2 The Disassembler 49 9. Unit Libraries 52 9.1 Library Structure 52 10. Inferences Drawn from Analyses 53 10.1 Linker Granularity 53 10.2 Floating-Point Emulation 53 10.2.1 Version 6.0 Compiler For MS-DOS 54 10.2.2 Version 1.0 Compiler For WINDOWS 54 11. Application Notes 55 12. Acknowledgements 56 13. References 57 14. INDEX 58 - iv - Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 1. INTRODUCTION This document is the outcome of an inquiry conducted into the structure and content of Borland Turbo Pascal for Windows (Version 1.0) Unit files. This followed naturally from previous inquiries into the structure of Unit Files for versions 5.0-6.0 of Borland's Turbo Pascal Compilers. I was further stimulated to undertake this as a result of a brief conversation I had with the Principal Architect of Turbo Pascal, Mr. Anders Hejlsberg, in Houston at the HAL-PC meeting that served as the platform for the formal announcement of Turbo Pascal for Windows. 1.1 CAVEATS The material contained herein represents the findings and interpretations of the author. A great deal of guess-work was required and no assurances are given as to the accuracy of either the findings of fact or the inferences contained herein which are the sole work-product of the author. In particular, only the materials and information that any normal Borland customer has access to were available to the author. Further, no Borland source-codes were available as the Library Routine source is not licensed to the author. In short, there was nothing irregular about how these findings were achieved. The material contained herein is placed in the public domain free of copyright for use of the general public at its own risk. The author assumes no liability for any damages arising from the use of this material by others. If you make use of this information and you get burned, TOUGH! The author accepts no obligation to correct any such errors as may exist in the supplied programs or in the findings of fact or opinion contained herein. On the other hand, this is not a "complete" work in that a great many questions remain open, especially as regards fine details. The author is highly-qualified in neither Intel 80xxx Assembly Language nor in Windows 3.0 application programming and several open questions might best be addressed by persons competent in these areas. The author welcomes the input of interested readers who might be able to "flesh- out" some of these open questions with "hard" answers so that all might benefit from their expertise. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 5 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 1.2 EVOLUTION The Unit first appeared in Turbo Pascal Version 4.0 (for MS-DOS) along with the ability to create ".EXE" instead of ".COM" files. This author began delving into these Unit files beginning with Version 5.0 of Turbo Pascal and each new version of the MS-DOS based product has seen significant changes in both the form and the content of ".TPU" files. In contrast, careful study should make it plain that the Unit File produced by Turbo Pascal for Windows is remarkably similar to that produced by Turbo Pascal Version 6.0 (for MS-DOS). In the main, the files produced by the MS-DOS product (TP6) were rich with apparently useless fields within some of the data structures. In essence, the Windows product (TPW) has made use of these fields in a coherent way that makes the Version 6 units appear to be subsets of the Windows Units as far as format is concerned. The Windows version development must have been well-advanced when the DOS version (6.0) hit the streets. In fact, Mr. Anders Hejlsberg did confirm my speculation that the compiler "engine" used in the Windows Product is the same as that used in version 6 of the DOS Product. 1.3 TREATMENT This report treats with BOTH Turbo Pascal for Windows and Turbo Pascal Version 6.0 (for MS-DOS). It views Unit Files for the MS-DOS version as sub-sets of those for the Windows version from the standpoint of structure. Because of this, the supplied program is able to process ".TPU" files from either compiler with little or no special handling. This doesn't mean that Version 6.0 Units can be combined with Windows Applications! When an application (program) is built by either of the compilers, ALL units must have been compiled by that same compiler if for no other reason than that the SYSTEM Unit (for one) is uniquely tailored to each of these environments. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 6 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 2. GROSS FILE STRUCTURE A Turbo Pascal Unit file consists of an array of bytes that is some exact multiple of sixteen (16). "Signature" information allows the compiler to verify that the .TPU file was compiled with the correct compiler version and to verify that the file is of the correct size. The fine structure of the file will be addressed in later sections at ever increasing levels of detail. Graphically, the file may be regarded as having the following general layout (major sections bounded by Í ) ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º Unit Header º Main Index to Unit File ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º Dictionaries: º º a) Interface º º b) Debug * º For Local Symbol Access ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º PROC Map º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º CSeg Map * º May be Empty ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º CONST DSeg Map * º May be Empty ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º VAR DSeg Map * º May be Empty ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º DLL List * º May be Empty ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º Donor Units * º May be Empty ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º Source Files º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º Trace Table * º May be Empty ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ º CODE Group * º May be Empty ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ º DATA Group * º May be Empty ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ º Code Fix-Ups * º May be Empty ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ º Data Fix-Ups * º May be Empty ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Each of the sections outlined by double lines is capable of being up to 64K bytes long. The Dictionary Area begins with the Unit Header and continues through the Trace Table. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 7 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 2.1 USER UNITS Units compiled by ordinary users have a very straight-forward appearance and content. The SYSTEM.TPU file is quite another thing however. 2.2 SYSTEM UNIT The SYSTEM.TPU file (found in TURBO.TPL and in the TPW.TPL file) is unique in several respects. It contains several types of entries that just don't seem to be achievable by ordinary users, and the arrangement of the entries in the dictionary is unique. Normally, the Name Entry for the Unit immediately follows the hash table but, in the "SYSTEM" unit, this is not true. Rather, the hash table is followed by all the descriptors for the built-in types, followed by descriptors for the standard procedures and functions, followed by the Name Entry for the Unit, followed by the conventional dictionary entries achievable by normal PASCAL coding such as the Typed Constants and Variables defined in the "SYSTEM" unit. Try to compile a Unit named "SYSTEM" and you find that the compiler wants a file called "SYSTEM.TPS". I suspect that "SYSTEM.TPS" is a file that contains a pre-initialized interface hash table plus the descriptors for the standard types and the descriptors for the built- in procedures and functions stored in the "SYSTEM" Unit (which would otherwise require special syntax to define). The compiler can't operate normally without a "SYSTEM" unit so this file probably provides a "bootstrap" mechanism for the built-in descriptors needed to build "SYSTEM.TPU". 3. LOCATORS The data in these files has need of structure and organization to support efficient access by the various programs such as the compiler, the linker and the debugger. This organization is built on a solid foundation of locators employed in the unit's data structures. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 8 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 3.1 LOCAL LINKS Local Links (LL's) are items of type WORD (2 bytes) which contain an offset which is relative to the origin of the Dictionary Area of the unit. This implies that the Dictionary Area must be somewhat less than 64K bytes in size. If the Dictionary Area is loaded into the heap, then an LL can be used to locate any byte in the Dictionary Area. (See Below) Type LL = Word; { Local Scope Locators } 3.2 GLOBAL LINKS Global Links (LG's) are used to locate type descriptors and to locate allocation data for variables with the ABSOLUTE attribute which may reside in other Units (i.e., units external to the present unit). LG's are structured items consisting of two (2) words (see below). LG = RECORD UntLL: LL; { To item in Unit Named by LL below } UntId: LL; { Stub Type "Y" Name Entry in our Unit } END; The first of these is an LL that is relative to the origin of the Dictionary Area of the (possibly external) unit. It locates either a Type Descriptor or the stub of the Name entry which establishes storage allocation. The second word is an LL which locates the stub of the Name entry in the current unit dictionary for the (possibly external) target unit. The Name entry for this stub identifies name of the unit that contains the item the LG points to. This provides a handy mechanism for locating type descriptors and allocation information which may be defined in other separately compiled units. 3.3 TABLE OFFSETS Finally, various data-structures within a .TPU file are organized as arrays of fixed-length records or as lists of variable-length records. Efficient access to such records is achieved by means of offsets rather than subscripts (an addressing technique denied Pascal). These offsets are relative to the origin of the array or list being referenced. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 9 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 3.4 BASIC RELATIONSHIPS ÉÍÍÍ> ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ º ÚÄÄÄÄÄÄÄÄ<´ Unit Header ³ ³ Symbol Dictionary ³ º D ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ (names, types etc) ³ º I ³ LL ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ LL's ³ defined in INTERFACE ³ º C ÃÄÄÄÄÄÄÄÄ>´ INTERFACE Hash ÃÄÄÄÄÄÄÄ>´ ³ º T ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÙ º I ³ LL ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ LL's ÚÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄ¿ º O ÃÄÄÄÄÄÄÄÄ>³ DEBUG Hash ÃÄÄÄÄÄÄÄ>´ DEBUG Dictionary ³ º N ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ Local Symbol option ³ º A ³ LL ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ builds this. Holds ³ º R ÃÄÄÄÄÄÄÄÄ>´ PROC Map Table ³ ³ names and types etc ³ º Y ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ from IMPLEMENTATION ³ º ³ LL ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Linked to INTERFACE ³ º A ÃÄÄÄÄÄÄÄÄ>´ CSeg Map Table ³? ³ part by LL's. ³ º R ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ º E ³ LL ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ º A ÃÄÄÄÄÄÄÄÄ>´ DSeg Map CONST ³? º ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ º ³ LL ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ º ÃÄÄÄÄÄÄÄÄ>´ DSeg Map VAR's ³? º ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ º ³ LL ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ º ÃÄÄÄÄÄÄÄÄ>³ DLL List ³? º ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ IMPORTANT NOTES º ³ LL ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ º ÃÄÄÄÄÄÄÄÄ>´ Donor Unit List³? Some of the structures º ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ shown in this figure º ³ LL ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ are built only if they º ÃÄÄÄÄÄÄÄÄ>´ Source File List ³ are needed. These are º ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ marked by a "?" next º ³ LL ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ to the box. º ÃÄÄÄÄÄÄÄÄ>´ Debug Step Ctls ³? ÈÍÍÍ> ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ If the DEBUG Dictionary ³ ** ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ is missing, its LL ÃÄÄÄÄÄÄÄÄ>´ CODE Segments ³? leads directly to the ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ INTERFACE Dictionary. ³ ** ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÃÄÄÄÄÄÄÄÄ>´ CONST DATA Segs ³? ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ** ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÃÄÄÄÄÄÄÄÄ>´ CODE Fix-Ups ³? ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ** ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÀÄÄÄÄÄÄÄÄ>´ CONST Fix-Ups ³? ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ This figure illustrates the role of the Unit Header in tying together the various data structures in the Unit. The type of link is shown next to a flow-line by "LL", "LG" or "**". "LL" and "LG" are explicit pointers while "**" shows a locator whose value is computed using other data in the Unit Header and that no explicit pointer exists. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 10 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÚÄÄÄÄ(from hash tables,other Name Entries) ³ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ³ Header Part ³ Stub Part -- many formats ³ ÀÄÄÄ>´ - - - - - - ³ - - - ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ³ ³ ³ data, ³ Some stubs have embedded ³ Name ³ Name, Class ³ links ³ Type Descriptors ³ Entry ³ and link to ³ (see ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ³ ³ prior entry ³ below)³ ³ INLINE Declarative ³ ³ having same ³ * ³ ³ code bytes for a ³ ³ hash-if any ³ ³ ³ ³ "macro" type PROC ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄijÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ÚÄÄÄÄÄÄÄÄÄÄÙ ³ ³ FAR pntr ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÃÄÄÄÄÄÄÄÄÄÄÄ>´ Absolute Memory Locations ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ LG's ³ Type Descriptors and stubs ³ ÃÄÄÄÄÄÄÄÄÄÄÄ>´ of Dictionary Entries used ³ ³ ³ for absolute equivalences ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ LL's ³ Nested Scope Hash Tables ³ ÃÄÄÄÄÄÄÄÄÄÄÄ>´ Parent Scope Dictionary Entries ³ ³ ³ Record Fields ³ ³ ³ Object Fields/Methods ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Offsets ³ CONST DSeg Map Table ³ ÀÄÄÄÄÄÄÄÄÄÄÄ>´ PROC Map Table ³ ³ VAR DSeg Map Table ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ This figure illustrates the many types of entities that associate with Name Entries and particularly with their Stub Parts. Not all of the links shown occur in a single Stub format, but all of the links in the figure can and do exist in selected cases. The purpose here is to show the flexibility of the system of links in associating required data with the Name Entry and its identifying symbol. While it may not be apparent from the figure, the dictionary structure as a whole may be viewed as a cyclic directed graph which is rooted in the DEBUG Hash Table. The recursive properties exhibited by the node relationships permit direct support of the scope rules of Turbo Pascal with simplicity and elegance. As one might expect, the representation of the required information lends itself to efficient use of storage since the representations are compact and there is very little in the way of redundancy. The small amount of redundancy that does exist is apparently aimed at speeding access to certain structures by the Turbo components (compiler, linker and debugger). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 11 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÚÄÄÄÄ(implied links, explicit LG's from other structures) ³ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ³ Flags and codes, allocation widths for data ³ Type ÀÄÄÄ>´ and VMT's, subrange constraints, formal ³ Descriptor ³ parameter descriptors, implicit associated ³ Contents & ³ type descriptors, LL's, LG's and Offsets. ³ Linkages ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ ³ LG's ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄ>´ Type Descriptors ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ LL's ³ Method Name Entries ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄ>´ Nested Scope Hash Tables ³ ³ ³ Nested Scope Field Chains ³ ³ ³ Parent Scope Name Entry ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ Offsets ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄ>´ VMT pointers in Object Instances ³ ³ CONST DSeg Map Table Entries ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ This figure illustrates the relationships between Type Descriptors and other structures in the dictionary. Not all the links shown can exist with a single Type Descriptor since there are several variant forms of these descriptors (depending on base type) but in combination, these linkages are feasible. In addition to links, a great amount of data is stored which is peculiar to a given type declaration. Descriptors can be -- and are -- shared. Indeed, they were designed with that in mind. Once a NAMED type is declared, all entities that reference it are linked to it in some way (usually by an LG). Almost every form of type descriptor is found in the SYSTEM unit and this fact is used to advantage. When un-typed constants are declared, a built-in type descriptor is referenced (via an LG) which provides necessary information for maintenance of orderly dictionary structure. When a named-type is declared, it is almost always decomposed into an expression based on the built-in types of Turbo Pascal which are found in the SYSTEM unit with the aid of an LG. The semantics underlying the idea of the Unit mandate this very approach since program modules of any class which make references to units for definitions use the definitions as implemented by the unit which contains them. Re-defining the unit or any of its defined types leads to a natural requirement to re-compile those program modules which rely on the unit for definitions. The impact is fundamental since the storage representation of a unit-defined named type can change in quite radical ways. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 12 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 4. UNIT HEADER The Unit Header comprises the first 64 bytes of the .TPU file. It contains LL's that effectively locate all other sections of the .TPU file plus statistics that enable a little cross-checking to be performed. Some parts of the Unit Header appear to be reserved for future use since no unit examined by this author has ever contained non-zero data in these apparently reserved fields. 4.1 DESCRIPTION The Unit Header provides a high-level locator table whereby each major structure in the unit file can be addressed. The following provides a Pascal-like explanation of the layout of the header followed by further narrative discussion of the contents of the individual fields in the Unit Header. Type HdrAry = Array[0..3] of Char; UnitHeader = Record UHEYE : HdrAry; { +00 : = 'TPU9' } UHxxx : HdrAry; { +04 : = $00000000 } UHUDH : LL; { +08 : to Name Entry for This Unit } UGIHT : LL; { +0A : to Hash Table (INTERFACE) } UHPMT : LL; { +0C : to PROC Map } UHCMT : LL; { +0E : to CSeg Map } UHTMT : LL; { +10 : to DSeg Map-Typed CONST's } UHDMT : LL; { +12 : to DSeg Map-GLOBAL Variables } UHDLL : LL; { +14 : to DLL List (Windows Only) } UHLDU : LL; { +16 : to Donor Unit List } UHLSF : LL; { +18 : to Source file List } UHDBT : LL; { +1A : to Debug Trace Step Controls } UHENC : LL; { +1C : Size of Dictionary Area } UHZCS : Word; { +1E : Size of CODE Group } UHZDT : Word; { +20 : Size of Typed CONST Group } UHZFA : Word; { +22 : Fix-Up Bytes (CODE Group) } UHZFT : Word; { +24 : Fix-Up Bytes (Typed CONST's) } UHZFV : Word; { +26 : Size of GLOBAL VAR Data } UHDHT : LL; { +28 : to Hash Table (DEBUG) } UHSOV : Word; { +2A : Flags - Mostly Unknown } UHPad : Array[0..9] of Word; { +2C : Reserved for Future Expansion } End; { UnitHeader } UHEYE contains the characters "TPU9" in that order. This is clear evidence that this unit was compiled by Turbo Pascal Version 6.0 or by Turbo Pascal for Windows Version 1.0. UHxxx is apparently reserved and contains binary zeros. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 13 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ UHUDH contains an LL (WORD) which points to the Name Entry in which the name of this unit is found. UHIHT contains an LL (WORD) which points to a Hash table that is the root of the Interface Dictionary graph. UHPMT contains an LL (WORD) which points to the PROC Map for this unit. The PROC Map contains an entry for each Procedure or Function declared in the unit (except for INLINE types), plus an entry for the Unit Initialization section. The length of the PROC Map (in bytes) is determined by subtracting this UHPMT from UHCMT. UHCMT contains an LL (WORD) which points to the CSeg (CODE Group) Map for this unit. The CSeg Map contains an entry for each CODE Segment produced by the compiler plus an entry for each of the CODE Segments included via the {$L filename.OBJ} compiler directive. The length of this Map (in bytes) is obtained by subtracting UNCMT from UHTMT. The result may be zero in which case the CSeg Map is empty. UHTMT contains an LL (WORD) which points to the DSeg (DATA Segment) Map that maps the initializing data for Typed CONST items plus templates for VMT's (Virtual Method Tables) and DMT's (Windows Dynamic Method Tables) that are associated with OBJECTS which employ Virtual Methods. The length of this Map (in bytes) is obtained by subtracting UHTMT from UHDMT. The result may be zero in which case this DSeg Map is empty. UHDMT contains an LL (WORD) which points to the DSeg (DATA Segment) Map that contains the specifications for DSeg storage required by VARiables whose scope is GLOBAL. The length of this Map (in bytes) is obtained by subtracting UHDMT from UHDLL. The result may be zero in which case this DSeg Map is empty. UHDLL contains an LL (WORD) which points to the DLL list in Windows. In Version 6.0, this is always zero. UHLDU contains an LL (WORD) which points to a table of units which contribute either CODE or DATA Segments to the .EXE file for a program using this Unit. This is called the "Donor Unit Table". The length of this table (in bytes) is obtained by subtracting UHLDU from the word UHLSF. The result may be zero in which case this table is empty. UHLSF contains an LL (WORD) which points to a list of "source" files. These are the files used as sources during compilation. Examples are the Pascal Source for the Unit itself, plus the .OBJ files linked via the {$L filename.OBJ} compiler directive. The length of this list (in bytes) is obtained by subtracting UHLSF from the word UHDBT. There should be at least one entry in this list. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 14 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ UHDBT contains an LL (WORD) which points to a Trace Table used by the DEBUGGER for "stepping" through a Function or Procedure contained in this Unit. The length of this table (in bytes) is obtained by subtracting UHDBT from the word UHENC. The result may be zero in which case this table is empty. UHZDA is a WORD that contains the total byte count of the Dictionary Area for this unit. All bytes up to and including the Trace Table are included in this count. UHZCS is a WORD that contains the total byte count of all CODE Segments compiled into this Unit. UHZDT is a WORD that contains the total byte count of all Typed CONST, DMT and VMT DATA Segments compiled into this unit. UHZFA is a WORD that contains the total byte count of the Fix-Up Data Table for this unit for CODE (CSegs). UHZFT is a WORD that contains the total byte count of the Fix-Up Data Table for Typed CONST's. This usually implies that a VMT or DMT is getting its pointers relocated. UHZFV is a WORD that contains the total byte count of all GLOBAL VAR DATA Segments compiled into this unit. UHDHT contains an LL (WORD) which points to a Hash Table which is the root of the DEBUGGER Dictionary. If Local Symbols were generated by the compiler (directive {$L+}) then ALL symbols declared in the unit can be accessed from this Hash Table. If Local Symbols were suppressed there is no such Dictionary and the LL stored here points to the INTERFACE Dictionary. UHSOV This word contains flags. I have only been able to expose a few of the values with any real confidence. Here's what I know so far (expressed by bit numbers 15..0): 15..13: always zero? 12: always zero for Version 6.0 (DOS) Compiler? 1=DISCARDABLE, 0=PERMANENT Windows Segment. 11..7: always zero? 6: always zero for Version 6.0 (DOS) Compiler? 1=PRELOAD, 0=DEMANDLOAD Windows Segment. 5: always zero? 4: always zero for Version 6.0 (DOS) Compiler? 1=MOVEABLE, 0=FIXED Windows Segment. 3: always zero? 2: 0=DOS Compiler, 1=WINDOWS Compiler? 1: 1=DOS Compiler with {$O+}, else zero? 0: Unclear. Seems to imply that either this unit, or one that it references requires emulation support but this is only a guess. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 15 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ UHPad begins a series of ten (10) words that are apparently reserved for future use. Nothing but zeros have ever been seen here by this author. 4.2 UNIT SIZE An independent check on the size of the .TPU file is available using information contained in the Unit Header. This is also important for .TPL (Unit Library) organization. To compute the file :size, refer to the five (5) words -- UHZDA, UHZCS, UHZDT, UHZFA, and UHZFT. Round the contents of each of these words to the lowest multiple of 16 that is greater than or equal to the content of that word. Then form the sum of the rounded words. This is the .TPU file size in bytes -- a LongInt result. A Unit MAY be larger than 64K bytes. I finally tumbled to this when I began to analyze the Windows Unit "WOBJECTS". I now feel that each of the sections referenced by the sizes above may be up to 64K bytes long. This implies an upper limit for unit size of around 320K bytes. My face is actually quite red over this. Since a Unit has always been capable of producing a 64K Code Segment not to mention a Data Segment of nearly the same size, I can't explain why the significance of these "size" words didn't dawn on me sooner. 5. SYMBOL DICTIONARIES This area contains all available documentation of declared symbols and procedure blocks defined within the unit. Depending on compiler options in effect when the unit was compiled, this section will contain at a minimum, the INTERFACE declarations, and at a maximum, ALL declarations. The information stored in the dictionary is highly dependent on the context of the symbol declared. We defer further explanation to the appropriate section which follows. 5.1 ORGANIZATION A dictionary is organized with a Hash Table as its root. The hash table is used to provide rapid access to identifiers. A dictionary may be thought of as a directed graph. Each subgraph is rooted in a hash table. There may be a great many hash tables in a given unit and their number depends on unit complexity as well as the options chosen when the unit was compiled. Use of the {$L+} directive produces the largest dictionaries. The hash tables are explained in detail a few sections further on. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 16 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Hash tables point to Name Entries. When two or more symbols produce the same hash function result, a "collision" is said to occur. Collisions are resolved by the time-honored method of chaining together the Name Entries of those symbols having the same hash function result. Dictionary supersetting is accomplished using these chains. 5.2 INTERFACE DICTIONARY The INTERFACE dictionary contains all symbols and the necessary explanatory data for the INTERFACE section of a Unit. Symbols get added to the Unit using increasing storage addresses until the IMPLEMENTATION section is encountered. 5.3 DEBUG DICTIONARY The Debug dictionary (if present) is a superset of the INTERFACE dictionary. It is used by the Turbo Debugger to support its many features when tracing through a unit. If present, this dictionary is rooted in its own hash table. The hash table is effectively initialized when the IMPLEMENTATION keyword is processed by the compiler. This takes the form (initially) of an unmodified copy of the INTERFACE hash table, to which symbols are added in the usual fashion. Thus, the hash chains constructed or extended at this time lead naturally to the INTERFACE chains and this is how the superset is effectively implemented. 5.4 DICTIONARY ELEMENTS The dictionary contains four major elements. These are: hash tables, Name Entries, Name Stubs and Type Descriptors. The distinction between Name Entries and Name Stubs might appear to be rather arbitrary. They might just as easily be regarded as a single element (such as symbol entry). However, the case for the separate entity approach is strong since Stubs are DIRECTLY addressed via LG's and -- more to the point -- ONLY by LG's. Thus, it seems reasonable that this is a separate and very important structure -- at least in the minds of the architects at Borland. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 17 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.1 HASH TABLES As has been intimated, Hash Tables are the glue that binds the dictionary entries together and gives the dictionary its "shape". They effectively implement the scope rules of the language and speed access to essential information. Each Hash table begins with a 2-byte size descriptor. This descriptor contains the number of bytes in the table proper (less 2). Thus, the descriptor directly points to the last bucket in the hash table. For a hash table of 128 bytes, the size descriptor contains 126. The first bucket in the table immediately follows the size descriptor. 5.4.1.1 SIZE So far, three different hash table sizes have been observed. The INTERFACE and DEBUG hash tables are usually 128 bytes (64 entries) in size plus 2 bytes of size description, but the SYSTEM.TPU unit is a special case, containing only 16 entries. Hash tables which anchor subgraphs whose scope is relatively local usually contain four (4) entries (8 bytes). Graphically, a Hash Table with four slots has the following layout: ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ 0006h ³ Size Descriptor ÃÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ´ ³ slot 0 ³ an LL or zero ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ slot 1 ³ an LL or zero ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ slot 2 ³ an LL or zero ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ slot 3 ³ an LL or zero ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ It should be noted that the Size Descriptor furnishes an upper bound for the hash function itself. Thus, it seems possible that a single hash function is used for all hash tables and that its result is ANDed with the Size Descriptor to get the final result. Because the sizes are chosen as they are (powers of 2) this is feasible. Note that in the above example, 6 = 2 * (n - 1) where n = 4 {slot count}. All of the hash tables observed so far have this property. One final note on this subject. Given these properties, "Folding" of sparse hash tables is a rather trivial exercise so long as the new hash table also contains a number of slots that is a power of 2. This point is intriguing when one recalls that the System.TPU hash table has only 16 slots rather than the usual 64. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 18 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.1.2 SCOPE The INTERFACE and Debug dictionary hash tables are Global in Scope even though the symbols accessed directly via either hash table may be private. On the other hand, other hash tables are purely local in scope. For example, the fields declared within a record are reached via a small local hash table, as are the arguments and local variables declared within procedures and functions. Even OBJECTS use this technique to provide access to Methods and Object Fields. Access to such local scope fields/methods requires use of qualified names which ensures conformity to Pascal scope rules. The method is truly simple and elegant. 5.4.1.3 SPECIAL CASES The SYSTEM.TPU Unit is a special case. Its INTERFACE hash table has apparently been "hand-tuned" for small size and it contains only sixteen (16) entries. I have always felt that "hand-coding" must have been used to achieve the SYSTEM unit. The implications of the file "SYSTEM.TPS" required for compilation of the SYSTEM unit seem to support this opinion. Certainly, there are aspects of this unit that appear conventional, but there is much that is unique and apparently not the result of PASCAL coding. Library sources should help clarify this. (See 2.2 SYSTEM UNIT on page 8) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 19 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.2 NAME ENTRIES This is the structure that anchors all information known by the compiler about any symbol. The format is as follows: DNameRec = RECORD HLink : LL; { Hash Chain Link; Resolves Collisions } DForm : Char; { Symbol Class } DSymb : STRING[63]; { Text of Symbol (UPPER-CASE) } END; HLink: An LL which points to the next (previous) symbol in the same unit which had the same hash function value. DForm: A character that defines the class the symbol belongs to and defines the format of the Name Stub which follows the Name Entry. If the symbol is declared in the component list of the "private" part of an Object declaration, then this character is modified by adding $80 to its ordinal value. Thus, an ordinary Function, Procedure or Method is of category "S" while a private Method is of category Chr(Ord('S')+$80). DSymb: A String (in the Pascal sense) of variable size that contains the text of the symbol (in UPPER-CASE letters only). The SizeOf function is not defined for these strings since they are truncated to match the symbol size. The "value" of the SizeOf function can be determined by adding 1 to the first byte in the string. Thus, Ord(Symbol[0])+1 is the expression that defines the Size of the symbol string. Turbo Pascal defines a symbol as a string of relatively arbitrary size, the most significant 63 characters of which will be stored in the dictionary. Thus, we conclude that the maximum size of such a string is 64 bytes. 5.4.3 NAME STUBS Name Stubs immediately follow their respective Name Entries and their format is determined by the class code in the Name Entry. The function of the stub is to organize the information appropriate to the symbol and provide a means of accessing additional information such as type descriptors, constant values, parameter lists and nested scopes. The format of each Stub is presented in the following sub-sections. 5.4.3.1 LABEL DECLARATIVES ("O") This Stub consists of a WORD whose function is (as yet) unknown. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 20 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.3.2 UN-TYPED CONSTANTS ("P") Format is as follows (CASE fragment): 'P':( { --- For Untyped Constants --- } sPTD : LG; { to type descriptor } sPV1 : LongInt; { constant value - size variable } ); sPTD: An LG which points to a Type Descriptor (usually in SYSTEM.TPU). This establishes the minimum storage requirement for the constant. The rules vary with the type, but the size of the constant data field (which follows) is defined using the Type Descriptor(s). sPV1: The value of the constant. For ordinal types, this value is stored as a LONGINT (size=4 bytes). For Floating-Point types, the size is implicit in the type itself. For String types, the size is determined from the length of the string which is stored in the initial byte of the constant. 5.4.3.3 NAMED TYPES ("Q") This Stub consists of an LG (4-bytes) that points to the Type Descriptor for this symbol. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 21 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.3.4 VARIABLES, FIELDS, TYPED CONS ("R") This Stub contains information required to allocate and describe these types of entities. The format and content is as follows: 'R': ( { -- Variable, Field, Object -- } sRAM: Byte; { allocation method codes: } sRVF: CASE sRAM: Byte Of $02,$06, $22,$26: (ROfs : Word; { allocation offset (BP) } ROB : Word); { To Parent Scope/Zero } $00,$01: (TOfs : Word; { allocation offset in map} TOB : LL); { offset in VAR/CONST Map } $03: (AFar : Word); { FAR Pointer to Location } $08: (Bofs : Word; { Offset-Record Relative } RChn : LL); { To Next Field/Method } $10: (QLG : LG); { to Stub of Allocator } END; sRTD: LG); { to Type Descriptor } sRAM: A one-byte flag that precisely identifies the class of the item being described. The known values and their apparent meanings follow: $00 -> Global Variables (Allocated in DS); $01 -> Typed Constants (Allocated in DS); $02 -> Procedure LOCAL Variables on STACK; $03 -> Variables at Absolute Addresses; $06 -> ADDRESS Arguments allocated on STACK; (This is now used only for SELF in Method calls;) $08 -> Fields sub-allocated in RECORDS and OBJECTS, plus METHODS declared for OBJECTS. $10 -> Variable Equivalenced to another via the Absolute Clause; $22 -> Arguments whose VALUEs are passed on the stack; $26 -> Arguments whose ADDRESSes are passed on the stack. sRVF: Two words whose content vary with sRAM above. Their are shown as case variants in the following: $02,$06,$22,$26: {arguments} sRVF.ROfs: Word -- Offset relative to either DS or BP. sRVF.ROB: Word -- LL to Dict Header of Parent Scope, or zero. $00,$01: {VAR's or typed CONSTs} sRVF.TOfs: Word -- Offset relative to allocation area origin; sRVF.TOB: Word -- Offset to entry in VAR/CONST Map for item allocation; $03: {Absolute Address Variable} sRVF.AFar: POINTER -- FAR Pointer to Absolute Memory Address. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 22 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ $08: {Record/Object Fields/Methods} sRVF.BOfs: Word -- Allocation Offset within Record/Object; sRVF.RChn: Word -- LL to next Field/Method. $10: {Absolute Equivalences} sRVF.QLG: LG -- LG to STUB of variable/parameter declaration that actually establishes the allocation; sRTD: An LG that locates the proper Type Descriptor for this symbol. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 23 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.3.5 SUBPROGRAMS & METHODS ("S") Subprograms (PROC's), especially since Object Methods are supported, have a rather involved stub. Its format is as follows: 'S': ( { ------ User Subprograms ----- } sSTp : Byte; { BIT Encoded Flags } sSxx : Byte; { More Attribute Flags? } sSPM : Word; { Code byte count if INLINE, } { else, offset to PROC Map } sSPS : LL; { to containing scope or zero } sSHT : LL; { to local scope hash table } sSVM : Word); { VMT Offset-VIRTUAL Method PTR } sSTP: A byte that contains bit-switches that seem to describe the Call Model and imply the size of this stub. These switches determine what kind of code (if any) is generated when the PROC is referenced. The observed values are as follows: xxxxx001 -> PROC uses FAR Call Model; xxxx0010 -> PROC uses INLINE Model (no Call); xxxx0100 -> PROC uses INTERRUPT Model (no Call); xxxx100x -> PROC has EXTERNAL attribute; xxx1xxxx -> PROC uses METHOD Call Model; x011xxxx -> PROC is a CONSTRUCTOR Method; x101xxxx -> PROC is a DESTRUCTOR Method; 1xxxxxxx -> PROC has ASSEMBLER directive. sSxx: A byte whose function is not yet fully known. In the Windows compiler it is copied into the PROC Map - presumably for use by the linker or debugger. Bit positions firmly established are as follows (7..0): 7-6: always zero? 5: ???? 4: Dynamic Call Model using DMT 3-2: 11 = DLL PROC Referenced by NAME 01 = DLL PROC Referenced by INDEX 1: ???? 0: always zero??? sSPM: A Word whose interpretation depends on whether or not we have an INLINE Declarative Subprogram. If this is an INLINE Declarative Subprogram, then this word contains the byte-count of the INLINE code text at the end of this stub. Otherwise, this word is the offset within the PROC Map that locates the object code for this Subprogram. sSPS: A Word that contains an LL which locates the containing scope in the dictionary, or zero if none. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 24 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ sSHT: A Word that contains an LL which locates the local Hash Table for this scope. A local hash table provides access to all formal parameters of the Subprogram as well as all Symbols whose declarations are local to the scope of this Subprogram. sSVM: A Word that is zero unless the symbol is a Virtual Method. In this case, then the content is the offset within the VMT for the owning object that defines where the FAR POINTER to this Virtual Method is stored. +0A: A complete Type-Descriptor for this Subprogram. The length is variable and depends upon the number of Formal Parameters declared in the header. (See 5.4.4.3.5 on page 34). +??: If this Symbol represents an INLINE Declarative Subprogram, then the object-code text begins here. The byte-count of the text is stored in sSPM in this stub. 5.4.3.6 TURBO STD PROCEDURES ("T") This Stub consists of two bytes, the first of which is unique for each procedure and increments by 4. I have found nothing in the SYSTEM unit (which is where this entry appears) that this seems directly related to. The second byte is always zero. 5.4.3.7 TURBO STD FUNCTIONS ("U") This Stub consists of two bytes, the first of which is unique for each function and increments by 4. I have found nothing in the SYSTEM unit (which is where this entry appears) that this seems directly related to. I wouldn't be surprised if this byte were an index into a TURBO compiler table that points to specialized parse tables/action routines for handling these functions and their non-standard parameter lists. The second byte seems to be a flag having the values $00, $40 and $C0. I strongly suspect that the flag $C0 marks exactly those functions which may be evaluated at compile-time. The meaning behind the other values is not known to me. 5.4.3.8 TURBO STD "NEW" ROUTINE ("V") This Stub consists of a WORD whose function is (as yet) unknown. This is the only Standard Turbo routine that can behave as a procedure as well as a function (returning a pointer value). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 25 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.3.9 TURBO STD PORT ARRAYS ("W") This Stub consists of a byte whose value is 0 for byte arrays, and 1 for word arrays. 5.4.3.10 TURBO STD EXTERNAL VARIABLES ("X") This Stub consists of an LG (4-bytes) that points to the Type Descriptor for this symbol. (These are used for the arrays MEM, MEMW and MEML.) 5.4.3.11 UNITS ("Y") Unit Stubs have the following content: +00: A Word whose apparently reserved for use by the Compiler or Linker. +02: A Word that seems to contain some kind of "signature" used to detect inconsistent Unit Versions. Borland calls this a "unit version number, which is basically a checksum of the interface part." I have seen a thread in CIS which says that it is a CRC value. Food for thought? +04: A Word that contains an LL which locates the Successor Unit in the "Uses" list. In fact, the "Uses" lists of both the INTERFACE and IMPLEMENTATION sections of the Unit are merged by this Word into a single list. A value of zero is used to indicate no successor. +06: A Word that contains an LL which locates the Predecessor Unit in the "Uses" list. For the SYSTEM unit entry, this value is always zero to indicate no predecessor. For the Unit being compiled, this LL locates the final Unit in the combined "Uses" list. In effect, the two LL's at offsets 0004 and 0006 organize the units into both forward and backward linked chains. The entry for the unit being compiled is effectively the head of both the forward and the backward chains. The final unit in the merged "Uses" list is the tail of the forward chain, and the SYSTEM unit is the tail of the backward chain. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 26 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.4 TYPE DESCRIPTORS Type Descriptors store much of the semantic information that applies to the symbols declared in the unit. Implementation details can be managed using high-level abstractions and these abstractions can be shared. 5.4.4.1 SCOPE Type Descriptor sharing can occur across the boundaries which are implicit in unit modules. Thus, a type defined in one unit may be "imported" by some other module. Also, the pre-defined Pascal Types (plus the Turbo Pascal extensions) are defined in the SYSTEM.TPU unit and there needs to be a means of "importing" such Type Descriptors during compilation. This is precisely the objective of the LG locator (see Section 3.2 on Page 9). Type Descriptors are NEVER copied between units. The binding always occurs by reference at compile time and this helps support the technique of modifying a unit and compiling it to a .TPU file, then re-compiling all units/programs that "USE" it. Type Descriptors have many roles so their format varies. We have divided these structures into two parts: The PREFIX Part (which is always present and) whose format is fairly constant and the SUFFIX Part whose content and format depends on the attributes that are part of the type definition. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 27 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.4.2 PREFIX PART The Prefix Part of every Type Descriptor consists of six (6) bytes. The usage is consistent for all types observed by this author and the format is as follows: +00: A Byte that identifies the format of the Suffix part. This is essentially based on several high-level categories which the Suffix Parts support directly. The observed set of values is as follows: 00h -> an un-typed entity; 01h -> an ARRAY type; 02h -> a RECORD type; 03h -> an OBJECT type; 04h -> a FILE type (other than TEXT); 05h -> a TEXT File type; 06h -> a SUBPROGRAM type; 07h -> a SET type; 08h -> a POINTER type; 09h -> a STRING type; 0Ah -> an 8087 Floating-Point type; 0Bh -> a REAL type; 0Ch -> a Fixed-Point ordinal type; 0Dh -> a BOOLEAN type; 0Eh -> a CHAR type; 0Fh -> an Enumerated ordinal type. +01: A Byte used as a modifier. Since the above scheme is too general for machine-dependent details such as storage width and sign control, this modifier byte supplies additional data. The author has identified several cases in which this information is vital but has not spent very much time on the subject. The chief areas of importance seem to be in the 8087 Floating-Point types, and the Fixed-Point ordinal types. The semantics seem to be as follows: 0A 00 -> The type "SINGLE" 0A 02 -> The type "EXTENDED" 0A 04 -> The type "DOUBLE" 0A 06 -> The type "COMP" 0C 00 -> an un-named BYTE integer 0C 01 -> The type "SHORTINT" 0C 02 -> The type "BYTE" 0C 04 -> an un-named WORD integer 0C 05 -> The type "INTEGER" 0C 06 -> The type "WORD" 0C 0C -> an un-named double-word integer 0C 0D -> The type "LONGINT" ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 28 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ +02: A Word that contains the number of bytes of storage that are required to contain an object/entity of this type. For types that represent variable-length objects/entities such as strings, this word may define the value returned by the SIZEOF function as applied to the type. This word is probably of value during compilation of un- typed CONST's since the size of their Stubs depend on this field. For STRING types however, the length descriptor is part of the string itself. +04 A Word that is zero (for DOS units) unless the descriptor is for an Object Method. In this case, the content is an LL to the Name Entry of the SUCCEEDING Method for the Object, in order of declaration, or zero if none. Some Windows units (e.g., SYSTEM) have non-zero values here whose function is not known. 5.4.4.3 SUFFIX PARTS Suffix Parts further refine the implementation details of the type and also provide subrange constraints where appropriate. In some cases the Suffix part is empty since all semantic data for the type is contained in the Prefix part. 5.4.4.3.1 UN-TYPED This Suffix Part is empty. Nothing is known about an un-typed entity. 5.4.4.3.2 STRUCTURED TYPES The structured types represent aggregates of lower-level types. We include ARRAY, RECORD, OBJECT, FILE, TEXT, SET, POINTER and STRING types in this category. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 29 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.4.3.2.1 ARRAY TYPES The Suffix Part of the ARRAY type is so constructed as to be able to support recursive or nested definition of arrays. The suffix format is as follows: +00: An LG that locates the Type Descriptor for the "base-type" of the array. This is the type of the entity being arrayed (which may itself be an array). +04: An LG that locates the Type Descriptor for the array bounds which is a constrained ordinal type or subrange. 5.4.4.3.2.2 RECORD TYPES RECORD types have nested scopes. The Suffix part provides a base structure by which to locate the fields local to the scope of the Record type itself. The format is as follows: +00: A Word containing an LL which locates the local Hash Table that provides access to the fields in the nested scope. +02: A Word containing an LL which locates the Name Entry of the initial field in the nested scope. This supports a "left-to-right" traversal of the fields in a record. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 30 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.4.3.2.3 OBJECT TYPES OBJECT types also have nested scopes. The Suffix part provides a base structure by which to locate the fields and METHODS local to the scope of the OBJECT type itself. In addition, inheritance and VMT particulars are stored. The format is as follows: +00: A Word containing an LL which locates the local Hash Table that provides access to the fields and METHODS local to the nested scope. +02: A Word containing an LL which locates the Name Entry of the initial field or METHOD in the nested scope. This supports a "left-to-right" traversal of the fields and METHODS in an OBJECT. +04: An LG which locates the Type Descriptor of the Parent Object. This field is zero if there is no such Parent. +08: A Word which contains the size in bytes of the VMT for this Object. This field is zero if the object employs no Virtual Methods, Constructors or Destructors. +0A: A Word which contains the offset within the CONST DSeg Map that locates the VMT skeleton or template segment. This field equals FFFFh if the object employs no Virtual Methods, Constructors or Destructors. +0C: A Word which contains the offset within an Object instance where the NEAR POINTER to the VMT for the object is stored (within the DATA SEGMENT). This field equals FFFFh if the object employs no Virtual Methods, Constructors or Destructors. +0E: A Word which contains an LL which locates the Name Entry for the name of the OBJECT itself. +10: A Word containing $FFFF in DOS units. In WINDOWS units this word contains the offset within the CONST DSeg Map that locates the DMT skeleton or template segment. This field equals FFFFh if the object employs no Dynamic Methods. +12: Three Words (not yet understood) containing zeroes. 5.4.4.3.2.4 FILE (NON-TEXT) TYPES This Suffix consists of an LG that locates the Type Descriptor of the base type of the file. Note that the Type Descriptor may be that of an un-typed entity (for un-typed files). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 31 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.4.3.2.5 TEXT FILE TYPES This Suffix consists of an LG that locates the Type Descriptor of the base type of the file -- in this case SYSTEM.CHAR. 5.4.4.3.2.6 SET TYPES This Suffix consists of an LG that locates the base-type of the set itself. Pascal limits such entities to simple ordinals whose cardinality is limited to 256. 5.4.4.3.2.7 POINTER TYPES This Suffix consists of an LG that locates the base-type of the entity pointed at. 5.4.4.3.2.8 STRING TYPES This is a special case of an ARRAY type. The format is as follows: +00: An LG to the Type Descriptor SYSTEM.CHAR which is the base type of all Turbo Pascal Strings. +04: An LG to the Type Descriptor for the array bounds constraints for the string. When the unconstrained STRING type is used, this points to SYSTEM.BYTE which is defined as a subrange 0..255. 5.4.4.3.3 FLOATING-POINT TYPES The Suffix part for all Floating-Point types is EMPTY. All data needed to specify these approximate number types is contained in the Prefix part. The Types included in this class are SINGLE, DOUBLE, EXTENDED, COMP and REAL. 5.4.4.3.4 ORDINAL TYPES The Ordinal Types consist of the various "integer" types plus the BOOLEAN, CHAR and Enumerated types. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 32 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.4.3.4.1 "INTEGERS" These types include BYTE, SMALLINT, WORD, INTEGER and LONGINT. Their Suffix parts are identical in format: +00: A double-word containing the LOWER bound of the subrange constraint on the type; +04: A double-word containing the UPPER bound of the subrange constraint on the type; +08: An LG that locates the Type Descriptor of the largest upward compatible type. This is the Type Descriptor that is used to control the width of an un-typed constant in the dictionary stub. For the "integer" types, this is an LG to SYSTEM.LONGINT. 5.4.4.3.4.2 BOOLEANS This type Suffix has the following format: +00: A double-word containing the LOWER bound of the subrange constraint on the type; +04: A double-word containing the UPPER bound of the subrange constraint on the type; +08: An LG that locates the Type Descriptor SYSTEM.BOOLEAN. There is no "upward compatible" type. 5.4.4.3.4.3 CHARS This type Suffix has the following format: +00: A double-word containing the LOWER bound of the subrange constraint on the type; +04: A double-word containing the UPPER bound of the subrange constraint on the type; +08: An LG that locates the Type Descriptor SYSTEM.CHAR. There is no "upward compatible" type. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 33 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.4.4.3.4.4 ENUMERATIONS This type Suffix is unusual and has the following format: +00: A double-word containing the LOWER bound of the subrange constraint on the type; +04: A double-word containing the UPPER bound of the subrange constraint on the type; +08: An LG that locates the Prefix of the current Type Descriptor. There is no upward compatible type. What follows is a full-fledged SET Type Descriptor whose base type is the Type Descriptor of the Enumerated Type itself. The author has not yet discovered the reason for this. At least one case has been observed where a set type descriptor is followed by a word containing zero but I know of no explanation. Could this be a (shudder) BUG in Turbo? 5.4.4.3.5 SUBPROGRAM TYPES The length of this Suffix is variable. The format is as follows: +00: An LG that locates the Type Descriptor of the FUNCTION result returned by the Subprogram. This field is zero if the Subprogram is a PROCEDURE. +04: A Word that contains the number of Formal Parameters in the Function/Procedure header. If non-zero, then this word is followed by the parameter list itself as a simple array of parameter descriptors. The format of a parameter descriptor is as follows: 0000: An LG that locates the Type Descriptor of the corresponding parameter; 0004: A Byte that identifies the parameter passing mechanism used for this entry as follows: 02h -> VALUE of parameter is passed on STACK, 06h -> ADDRESS of parameter is passed on STACK. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 34 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 6. MAPS AND LISTS The "MAPS and LISTS" are not part of the symbol dictionary. Rather, these structures provide access to the Code and Data Segments produced by the compiler or included via the {$L name.OBJ} directive. The format and purpose (as understood by this author) of each of these tables is explained in the following sections. 6.1 PROC MAP The PROC Map provides a means of associating the various Function and Procedure declarations with Code Segments and DLL's. There is some evidence that the Compiler produces CODE (and DATA) Segments for EACH of the Subprograms defined in the Unit as well as for the un-named Unit Initialization code block. There is also evidence that EXTERNAL PROCs must be assembled separately in order to exploit fully the Turbo "Smart Linker" since Turbo Pascal places some significant restrictions on EXTERNAL routines in the area of Segment Names and Types. Specifically, only code segments named "CODE" and data segments named "DATA" or "CONST" will be used by the "Smart Linker" as sources of code and data for inclusion in a Turbo Pascal .EXE file. (Turbo 6.0 relaxed Name constraints but only one code segment per .OBJ remains a limitation). The first entry in the PROC Map is reserved for Unit Initialization block. If there is no Unit Initialization block, this entry will be marked with $FFFF. In addition, each and every PROC in the Unit has an entry in this table (except for INLINE procs). If an EXTERNAL routine is included, then ALL PUBLIC PROC definitions in that routine must be declared in the Unit Source Code with the EXTERNAL attribute. The size of the PROC Map Table (in Bytes) is implied in the Unit Header by the LL's named UHPMT and UNCMT. The Format of a single PROC Map Entry is as follows: +00: A Word presumably reserved as a work area; always zero. +02: A Word which contains Flags copied from sSxx in the Stub for the Subprogram. This word is always zero for the DOS compiler. (see 5.4.3.5, page 24) +04: A Word that contains an offset within the CSeg Map. This is used to locate the code segment containing the PROC. If the PROC is found in a DLL, then this word is an offset within the DLL List to the DLL name (i.e., the file with the .DLL extension). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 35 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ +06: A Word that contains an offset within the CODE Segment that defines the PROC entry point relative to the load point of the referenced CODE Segment if local to this unit. For DLL PROCS referenced by "INDEX" this word is the procedure "INDEX" number within the DLL. For DLL PROCS referenced by "NAME" this word is an offset to that name which is stored in the DLL List. 6.2 CSEG MAP The CSeg Map provides a convenient descriptor table for each CODE Segment present in the Unit and serves to relate these segments with the Segment Relocation Data and the Segment Trace Table. It seems reasonable to infer that the "Smart Linker" is able to include/exclude code/data at the SEGMENT level only. The CSeg Map is an array of fixed-length records whose format is as follows: +00: A Word apparently reserved for use by TURBO. +02: A Word that contains the Segment Length (in bytes). +04: A Word that contains the Length of the Fix-Up Data Table for this Code Segment (in bytes). +06: A Word that contains the offset of the Trace Table Entry for this Segment (if it was compiled with DEBUG Support). If there is no Trace Table for this segment, then this Word contains FFFFh. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 36 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 6.3 TYPED CONST DSEG MAP The CONST DSeg Map provides a convenient descriptor table for each DATA Segment which was spawned by the presence of Typed Constants or VMT's in the Pascal Code. It serves to relate these segments with the Segment Fix-Up (relocation) Data and with the Code Segments that refer to these DATA elements. One entry is present for each CONST declaration part containing typed constants and for each CONST segment linked from an ".OBJ" file. The CONST DSeg Map is an array of fixed- length records whose format is as follows: +00: A Word apparently reserved for use by TURBO. +02: A Word that contains the Segment Length (in bytes). +04: A Word that contains the Length of the Fix-Up Data Table for this DATA Segment (in bytes). +06: A Word that contains an LL which locates the OBJECT that owns this VMT or DMT template or zero if the segment is not a VMT or DMT template. One can determine the defining block for a Typed Constant declaration and our program attempts to do just that. A by-product of the dictionary mapping algorithm allows the declaring block to be found and its qualified name printed. This information is also used to explain fix-up data as to its source. Results will be incomplete unless a really comprehensive dictionary is present in the unit. 6.4 GLOBAL VAR DSEG MAP The VAR DSeg Map provides a convenient descriptor table for each DATA Segment present in the Unit. One entry exists for each VAR declaration part whose scope is not local to a PROC and so is allocated in the DATA Segment. CODE Segments may have references to these in the CODE Fix-Up Data Table. Each EXTERNAL CSeg having a segment named DATA also spawns an entry in this table. The VAR DSeg Map is an array of fixed-length records whose format is as follows: +00: A Word apparently reserved for use by TURBO. +02: A Word that contains the Segment Length (in bytes). This may be zero, especially if the EXTERNAL routine contains a DATA segment whose sole purpose is to declare one or more EXTRN symbols that are defined in some DATA segment external to the Assembly. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 37 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ +04: A Word apparently reserved for use by TURBO. +06: A Word apparently reserved for use by TURBO. One can determine the defining block for a Global VARiable declaration and our program attempts to do just that. A by-product of the dictionary mapping algorithm allows the declaring block to be found and its qualified name printed. This information is also used to explain fix-up data as to its source. Results will be incomplete unless a really comprehensive dictionary is present in the unit. Such DSegs can be referenced by many CSegs and we only locate the first one. This is okay for Pascal code but it's ambiguous for assembler since the names may be PUBLIC and referenced by more than one module. 6.5 DLL LIST This list is present ONLY in Units compiled by the Windows Version and then only if the unit calls Dynamic Link Library (DLL) PROCS. The DLL List has the following format: +00: Four (4) bytes of binary zeroes (reserved for work?). +04: A variable-sized String that contains the name of the DLL MEMBER name or the PROC name (for DLL reference by NAME). The string is truncated to actual size as usual for a unit. Procedures or Functions which reside in DLL's have entries in the PROC map but NOT in the CSeg Map since the executable code is external. 6.6 DONOR UNIT LIST This list contains an entry for each Unit (taken from the "USES" list) which MAY contribute either CODE or DATA to the executable file. Not all units do make such a contribution as some exist merely to define a collection of Types, etc. A Unit gets into this list if there exists a single Fix-Up Data Entry that references CODE or DATA in that Unit. The list is comprised of elements whose SIZE is variable and whose format is as follows: +00: A WORD apparently reserved for use by TURBO. +02: A variable-length String containing the unit name. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 38 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 6.7 SOURCE FILE LIST This list contains an entry for each "source" file used to compile the Unit. This includes the Primary Pascal file, files containing Pascal code included by means of the {$I filename.xxx} compiler directive, and .OBJ files included by the {$L filename.OBJ} compiler directive. The order of entries in this list is critical since it maps the CODE segments stored in the unit. The order of the entries is as follows: The Primary Pascal file; All Included Pascal files; All Included .OBJ files. Mapping of CSegs to files is done as follows: Each .OBJ file contributes a SINGLE Code Segment (if any). Note that this author has not observed an .OBJ module that contains only a DATA Segment (but that seems a distinct possibility). The Primary Pascal file (augmented by all included Pascal Files) contributes zero or more CODE Segments. Therefore, there are at least as many CSeg entries as .OBJ files. If more, then the excess entries (those at the front of the list) belong to the Pascal files that make up the Pascal source for the unit. The format of an entry in this list is as follows: +00: A flag byte that indicates the type of file represented; 04h -> the Primary Pascal Source File, 03h -> an Included Pascal Source File, 05h -> an .OBJ file that contains a CODE segment 06h -> an .RES file from {$R xxx.RES} (Windows RESOURCE). +01: A Word apparently reserved for use by the Compiler/Linker. +03: A Word that is zero for .OBJ files and which contains the file directory time-stamp for Pascal Files. +05: A Word that is zero for .OBJ files and which contains the file directory date-stamp for Pascal Files. +07: A variable-sized string containing the filename and extension of the file used during compilation. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 39 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 6.8 DEBUG TRACE TABLE If Debug support was selected at compile time, then all Pascal code which supports Debugging produces an entry in this table. The table entries themselves are variable in size and have the following format: +00: A Word which contains an LL that locates the Directory Header of the Symbol (a PROC name) this entry represents. +02: A Word which contains the offset (within the Source File List) of the entry that names the file that generated the CSeg being traced. This allows the file included by means of the {$I filename} directive to be identified for DEBUG purposes, as well as code produced from the Primary File. +04: A Word containing the number of bytes of data that precede the BEGIN statement code in the segment. For Pascal PROCS these bytes consist of literal constants, un-typed constants, and other data such as range-checking limits, etc. +06: A Word containing the Line Number of the BEGIN statement for the PROC. +08: A Word containing the number of lines of Source Code to Trace in this Segment. +0A: An array of bytes whose size is at least the number of source code lines in the PROC. Each byte contains the number of bytes of object code in the corresponding source line. This appears to be an array of SHORTINT since if a "line" contains more than 127 bytes, then a single byte of $80 precedes the actual byte count as a sort of "escape" and the next byte records the up to 255 bytes for the line. This situation has not yet been fully explored. We do not yet know what happens in the event a line is credited with spawning more than 255 bytes of code. 7. CODE, DATA, FIX-UP INFO This area begins at the start of the next free PARAGRAPH. This means that its offset from the beginning of the Unit ALWAYS ends in the digit zero. This area contains the CODE segments, CONST DATA segments, and the Relocation (Fix-Up) Data required for linking. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 40 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 7.1 OBJECT CSEGS Each CODE segment included in the unit appears here as specified by the CSeg Map Table. Depending on usage, these segments may appear in the executable file. There are no filler bytes between segments. 7.2 CONST DSEGS This section begins at the start of the first free PARAGRAPH following the end of the Object CSegs. This means that its offset from the beginning of the Unit ALWAYS ends in the digit zero. A DATA segment fragment appears here for each CSeg that declares a typed constant, and for each OBJECT which employs Virtual Methods, Constructors or Destructors. There are no filler bytes between segments. If local symbols were generated, there is always enough information to allow documenting the scope of the declaration as well as interpreting the data in the display since the needed type declarations would also be available. Our program merely identifies the defining block. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 41 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 7.3 FIX-UP DATA TABLES There are - at most - two Fix-Up Data Tables in any given .TPU file. The first is for the CODE Area and the second is for the CONST DSeg area. Both are paragraph aligned and both have size information in the unit header. Turbo Pascal for DOS and Turbo Pascal for Windows apparently utilize differing code-generation models where floating-point is concerned. The nub of the difference appears to lie in emulation support. In the DOS product, the 8087 emulator is included in the SYSTEM unit while a WINDOWS DLL (WIN87EM) furnishes floating-point emulation support for applications. This seems to be the reason for a new fix-up format and for the way floating-point options are presented in TP for Windows. The Table consists of an array of eight (8) byte entries whose format is as follows: +00: A Byte containing the offset within the Donor Unit List of the Unit name that this entry refers to. This can be the compiled Unit or some previously compiled external unit. +01: A Byte of BIT switches that identify the type of reference and the size of the needed fix-up (WORD or DWORD). A lot of guess-work led to the following interpretation: 7654 (bits 3-0 don't seem to be used) 00-- Locate item via a PROC Map, 01-- Locate item via a CSeg Map, 10-- Locate item via a Global VAR DSeg Map, 11-- Locate item via a Const DSeg Map, --00 WORD offset has NO effective address adjustment, --01 WORD offset HAS an effective address adjustment, --10 WORD SEGMENT-Only fix-up (address of some PUBLIC segment), --11 DWORD (FAR) pointer; possible effective address adjustment. +02: A Word containing the offset within the Map table referenced according to the above code scheme. +04: A Word containing an offset within the target segment which will be added to the effective address. For example, a reference to the VAR DSeg Map will require a final offset to locate the item (variable) within the DATA SEGMENT being referenced here. This may also be needed for references to LITERAL DATA embedded in a CODE SEGMENT. +06: A Word containing the offset within the CODE or DATA segment owning this entry that contains the area to be patched with the value of the final effective address. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 42 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ In the WINDOWS environment, an additional format is possible and it has the following appearance: +00: A Word containing $FFFF which appears to serve as a format identifier. +02: A Word containing an Emulator Fix-Up type code. After looking at many such entries in context with the object code, the following scheme seems to be operative: 2-> target floating point op has SS: override prefix; 3-> target floating point op has CS: override prefix; 4-> target floating point op has ES: override prefix; 5-> target floating point op has NO override prefix; 6-> target floating point op is "FWAIT" ($909B). +04: A Word that is probably always zero. +06: Offset to the floating-point operation to be emulated. This operation is always prefixed with a WAIT op ($9B) unless it is an FWAIT ($909B). If an operation is not so prefixed, then no fix-up record is generated for it. These latter fix-up records are (probably) incorporated into the .EXE file (following suitable transformations) so that the Windows Loader can see and process them. Presumably, they are simply ignored if a co-processor chip is present and working. If not, they tell the loader where the emulated instructions are. What the loader does with this information is pure guess-work but it probably works something like this: 1) if the Emulator Type code in the word at +02 indicates that a segment override prefix is present (codes 2..4), replace the first three bytes of the instruction with the following: $CD $3C "xxyyyyyy" where "yyyyyy" is the least-significant six bits of the "escape" byte (originally $D8..$DF) and "xx" is the ones-complement of the two-bit segment register value (00=ES, 01=CS,10=SS,11=DS). This method would result in replacement of the WAIT op ($9B), the segment override prefix, and the "escape" byte with the above string at program load time. This would allow an application to run regardless of the availability of co-processor support 2) if the Emulator Type code in the word at +02 is 5, then there is no override prefix. Replace the first two bytes of the instruction with the following: $CB $jj (where "jj" is "escape" - $A4). $jj is then chosen from the range $34..$3B. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 43 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 3) if the Emulator Type code in the word at +02 is 6, then the operation to emulate is FWAIT. Replace the $90 $9B with $CB $3D. Since $CB is the op-code for INT then, if emulation were in effect, we would produce INT $34-$3D whenever a floating-point operation was found that could be emulated. This approach has the advantage that we don't have to commit to emulation or non-emulation at compile-time. Rather, the decision is made at load time and is transparent to the user. It's interesting to note that the DOS compiler generates such code without benefit of fix- ups whenever both 8087 and emulation support are elected since the emulator is a component of the SYSTEM unit in DOS. In WINDOWS, we merely include a reference to WIN87EM plus the above fix-ups. The technique relies on the fact that 8087 ops are necessarily prefixed by the WAIT byte (except for the "FN..." variants). This provides sufficient space to replace as above in-situ. This approach WILL NOT work if the code contains floating-point instructions without a WAIT prefix byte. If the object code requires an 80287 or an 80387 (for example), then it would seem that that Interrupt 07H will have to be serviced by WIN87EM. This is all guess-work for now. I haven't seen any literature documenting WIN87EM techniques. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 44 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 8. SUPPLIED PROGRAM In order that the above information be made constructively useful, the author has designed a program that automates the process of discovery. It is not a work of art but it does give useful results provided your PC has enough available memory. The program source code has been re-organized many times as I simply haven't been able resist tinkering with it. Minor changes in its output have been implemented to enhance its usefulness. It should be obvious that the program was not designed "top-down". Rather, it just evolved as each new discovery was made. Later on, it seemed reasonable to try to document some of the relations between the various lists and tables and the program tries to make some of these relations clear, albeit with varying degrees of success. It may not be obvious to all readers, but the program is actually fighting a losing battle in many respects. The ".TPU" file was not designed with the intent of enabling de-compilation, disassembly or de-linking. Thus, some interesting semantic information is lost forever since it's not needed for either compilation or debugging. For example, it doesn't seem to be possible to determine with certainty the source file for a CONST DSeg or GLOBAL VAR DSeg where ".OBJ" files are linked into the ".TPU" file. Of course, it MAY be possible in certain cases but, in general, there is simply not enough information available to definitely determine the source. This is due to the fact that one ".OBJ" file may define such a DSeg and contain a CSeg that refers to it but, if the DSeg is PUBLIC, it may also be referred to by other CSegs. Each of the CSegs that make such references to the DSeg view it as an EXTERNAL as far as fix-up data is concerned. Therefore, it's impossible to determine which of the referencing CSegs was drawn from the same ".OBJ" file as the DSeg. 8.1 TWU1 This is the main program. It will ask for the name of the unit to be documented. Reply with the unit name only. The program will append the ".TPU" extension and will search for the proper file. It will also search the appropriate library file; if necessary. The program will then ask if the unit is a DOS or WINDOWS unit and will require a "w" or "d" answer. This determines which unit library file to search (TURBO.TPL or TPW.TPL) for the SYSTEM unit (among others). The program will then ask if Dis-Assembly is desired and will require a "y" or "n" answer. If "y", it also asks about the CPU. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 45 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The current directory will be searched first, followed by all directories in the current PATH. If the .TPU file is not found, the program will search for it in the "TURBO.TPL" or in the "TPW.TPL" (Turbo Pascal Library) file as appropriate. Units in the "USES" list(s) will also be loaded to enable resolution of LG items. If the desired unit is found, the program will write a report to the current directory named "unitname.lst" which contains its analysis. The format of the report is such that it may be copied to a printer if that printer supports TTY control codes with form-feeds. Be judicious in doing this however since there can be a lot of information. Some of the units supplied by Borland can produce almost 2 MB of report output, depending on whether it's Version 6.0 for DOS or Version 1.0 for Windows (some supplied Windows Units are BIG). 8.1.1 UNIT TWU1EQU This Unit contains constants, types and procedures of general utility that are not strictly unit or I/O related. One of the more powerful procedures is a general-purpose QuickSort procedure. It also contains a Heap Error Function that keeps track of the high- water mark of Heap Utilization of any program that uses it. This function gets installed automatically. This Unit makes SOME use of the INLINE assembler for speed and not out of sheer necessity. Some of the routines are INLINE Macros to provide for short expansions of otherwise overhead-ridden facilities. 8.1.2 UNIT TWU1RPT This is a Unit that contains the text-file output routines required by the main program. This relieves the main program of some of the tedium of handling report formatting and pagination issues. 8.1.3 UNIT TWU1UAM This Unit contains all Type Definitions, Structures, and primitive Functions and Procedures required by the program for ".TPU" file acquisition and analysis. All structures documented in this report are also documented in the interface by means of the TYPE mechanism. Some of the structures are difficult if not impossible to define using ISO Pascal but Turbo Pascal provides the means for getting the job done. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 46 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Some algorithms have been cast with object-orientation in mind and have potential for re-use in other contexts. The unit computes a cover for the dictionary and deduces relationships between dictionary, code, data and the CSeg, PROC, CONST and VAR Maps discussed in Sections 6.1 through 6.4 on Pages 35..37. This information is retrieved by the main program to drive the printing process. This Unit also loads all units specified in the USES list of the prime unit to allow the names of externally defined types to be recovered on the report. Array bounds are also retrieved in this way. The code will search for needed units in appropriate unit library file without intervention. Close attention is paid to Heap Management and minimal utilization of Heap storage. The dictionary areas of the Units located in the USES list get loaded into the Heap at no extra charge. Nothing but the dictionary area is of any use at this point. The name and fully-qualified file name of each unit successfully loaded are printed at the top of the listing. Unit version numbers must agree or the unit will not be loaded. Dictionary covers are computed for each loaded unit to aid in rapid LG-resolution. Lack of sufficient Heap Storage will not necessarily cause the program to fail. Heap Space MUST be available to load the primary unit and perform the necessary analyses, but the secondary or nested units are not essential. If they cannot be loaded, you merely lose some descriptive information. If Heap exhaustion occurs at a critical step however, the program will generate RunError 215. 8.1.4 UNIT TWU1UNA This unit is a rudimentary disassembler. The output will not assemble and may look strange to a "real" assembler programmer since I am not well-qualified in this area. However, the basis for support of 80286, 80386 etc. processors is present as well as coprocessor support. Of perhaps the greatest interest is that it does appear to decode the emulated coprocessor instructions that are implemented via INT 34-3D in the MS-DOS versions of Turbo Pascal. Be warned however. The output is not guaranteed since this was coded by myself and I am perhaps the rankest amateur that ever approached this quite awful assembler language. For convenience, the operand coding mimics TASM "Ideal" mode. As is usual with programs of this type, error-recovery is minimal and no context checking is performed. If the operation code is found to be valid, then a valid instruction is assumed -- even if invalid operands are present. The only positives that apply to this program are that it doesn't slow the cpu down (although a lot more output is produced), and it does let one "tune" code for compactness by letting one view the results of the coding directly. Also, incomplete instructions are handled as data rather than overrunning into the next proc. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 47 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 8.2 NOTES ON PROGRAM LOGIC The following sections discuss a few of the methods employed by the supplied program. There are no cutting-edge algorithms here. Results counted for a lot more than technique. 8.2.1 FORMATTING THE DICTIONARY Printing the unit dictionary area in a way that exposes its underlying semantics is no small task. The unit dictionary area itself is a rather amorphous-looking mass of data composed of hash tables, Name Entries and stubs, type descriptors, etc. In order to present all this information in a meaningful way, we have to reveal its structure and this cannot be done by means of a sequential "browse" technique. Rather, we have to visit all nodes in the dictionary area so that each may be formatted in a way that exposes their function and meaning. This is made necessary by the fact that items are added to the dictionary as encountered and no convenient ordering of entry types exists. What we have here is the problem of finding a minimal "cover" for the dictionary area that properly exposes the content and structure of the dictionary area. To do this, we scan the dictionary recursively to determine the number of structures that we need to map. Then we get heap storage for the array of records that will hold the mapping information and repeat our recursive dictionary scan, this time constructing the mapping records. The recursive algorithm is "delicate" in that it is vulnerable to the cycles that our analysis uncovers - particularly when polymorphic objects are involved. Therefore, we have incorporated a simple little trap that tries to discover such cycles and avoid them. It is possible that the algorithm could fail for exceedingly complex units but it handles the worst cases from Borland with ease. Prior versions of this unit accomplished this task without recursion but required too many tricky pointer manipulations that were environmentally sensitive, so recursion was adopted. Since unit dictionaries don't tend to be deeply nested, we get reasonable heap utilization coupled with stable algorithms. The result is an array containing one entry for each structure in the unit dictionary area that is identifiable via traversal. Each entry in the array contains information about nesting level, parent scope, structure type and location. The array thus forms a set of descriptors that drive the process of formatting the dictionary area for display. The process may be likened to "painting by the numbers" or to finding a way to lay tile on a flat surface using tiles of differing shapes until the floor is exactly covered. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 48 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ There is one significant limitation that needs to be pointed out. It is not always possible to determine the "parent" or "owner" of a node with certainty. The following discussion illustrates the problem of finding the "real" parent of a Type Descriptor. Almost every "type" in Turbo Pascal is actually derived from the basic types that are defined in the SYSTEM.TPU unit -- e.g. "INTEGER", "BYTE", etc. In addition, several of the Type Descriptors in the SYSTEM unit are referenced by more than one Name Entry. Thus, we find that a "many-to-one" relationship may exist between Name Entries and Type Descriptors. How does one find out which is the entry that actually gave rise to the Type Descriptor? The Dictionary Area of a unit has some special properties, one of which is the fact that the Name Entries for named Types are often located quite near their primary type descriptors. The Dictionary Area seems to be treated as an upward growing heap with the various structures being added by Turbo as encountered. This makes it likely that the Type "Q" header which gives rise to a type descriptor is quite likely to occur earlier in the Dictionary Area than any other entry which refers to the same descriptor. We use this property to allocate "ownership" but it may not be "fool-proof". Some type descriptors are spawned by other type descriptors, especially for structured types. Further, structured named types are often accompanied by pointer types and this results in having multiple named types sharing the same type descriptor. We don't attempt to allocate "ownership" to "spawned" type descriptors but we do try to keep track of scope information. A useful by-product of the above process is the ability to discover many of the associations between Global Variables, Typed CONST's, VMT's and the blocks in which they are declared or defined. 8.2.2 THE DISASSEMBLER To start with, I apologize up front for mistakes which are bound to be present in this routine. I am not really a MASM or TASM programmer and I will not pretend otherwise. This being the case, the formatting I have chosen for the operands may be erroneous or misleading and might (if submitted to one of the "real" assemblers) produce object code quite different from what is expected. I hope not, but I have to admit it's possible. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 49 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ My intention in adding this unit was to support hand-tuning of object code. With practice and some effort, one can observe the effect on the object module caused by specific Pascal coding. Thus, where compactness or speed is an issue of paramount importance, disassembly can be of help. In some cases, a simple re-arrangement of the local variable declarations in a procedure can have a significant effect on the size of the code if it means the difference between 1 and 2-byte displacements for each instruction that references a specific local variable. Potential applications along these lines seem almost unlimited. I adopted an operand format not unlike that of TASM "Ideal" mode since it was more convenient to do so and looked more readable to me. I relied on several reference books for guidance in decoding the entire mess and I found that there were several flaws (read ERRORS) in some of them which made the job that much more difficult. I then compounded my problems by attempting to handle 80386 specific code even though Turbo Pascal does not yet generate code specific to these processors. I simply felt that the effort involved in writing any sort of Dis-Assembly program for Turbo Pascal units was an effort best experienced not more than once. With all this self-flagellation out of my system once and for all, I will try to show the basic strategy of the program and to explain the limitations and some of the discoveries I made. The routine is intended to be idiotically simple - i.e., no smarter than the DEBUG command in principle. The basic idea is: pass some text to the routine and get back ONE line derived from some prefix of that text. Repeat as necessary until all text is gone. Thus, there is no attempt to check the context of the text being processed. Also, some configurations of the "modR/M" byte may invalid for selected instructions. I don't try to screen these out since the intent was to look at the presumably correct code produced by TURBO Pascal -- not devious assembly language. Also, this program regards WAIT operations as "stand-alone" -- i.e., it doesn't check to see if a coprocessor operation follows for which the WAIT might be regarded as a prefix. One area of real difficulty was figuring out the Floating-Point emulations used by Turbo Pascal Version 6.0 for DOS that are implemented by means of interrupts $34 through $3D. I don't know if I got it right, but the results seem reasonable and consistent. In the listing, the Interrupt is produced on one line, followed by its parameters on the next line. The parameter line is given the op-code "EMU_xxxx" where "xxxx" is the coprocessor op-code I felt was being emulated. Interrupt $3C was a real puzzler but after seeing a lot of code in context, I think that the segment override is communicated to the emulator by means of the first byte after the $3C. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 50 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Normally, in a non-emulator environment, all coprocessor operations (ignoring any WAIT prefixes) begin with $D8-$DF. What Borland (and maybe Microsoft) seem to have done here is to change the $D8-$DF so that bits 7 and 6 of this byte are replaced with the one's complement of the 2-bit segment register number found in various 8086 instructions. This seems to be how an override for the DS register is passed to the emulator. I don't KNOW this to be the correct interpretation, but the code I have examined in context seems to work under this scheme, so the disassembler uses it to interpret the operand accordingly. For 80x86 machines, the problem was somewhat simpler. The disassembler takes a quick look at the first byte of the text. Almost any byte is valid as the initial byte of an instruction, but some instructions require more than one byte to hold the complete operation code. Thus, step 1 classifies bytes in several ways that lead to efficient recognition of valid operation codes. Once the instruction has been identified in this way, it is more or less easy to link to supplemental information that provides operand editing guidance, etc. The tables that embody the recognition scheme were constructed using PARADOX (another fine Borland product) and suitably coded queries were used to generate the actual Turbo Pascal code for compilation. For those that are interested, the disassembler supports the address- size and operand-size prefixes of the 80386 as well as 32-bit operands and addresses but remember that Turbo Pascal doesn't generate these. A trivial change is provided for which allows segments which default to 32-bit mode to be handled as well. There is a simple mode variable that gets passed to the disassembler by its caller which specifies the most-capable processor whose code is to be handled. Codes are provided for the 8086 (8088 is the same), 80186 (same as 80286 without protected mode instructions), 80286 (80186 plus protected mode), and 80386. You now get asked which one to use. No such specifier is provided for coprocessor support. What is there is what I think an 80387 supports. I don't think that this is really a problem if you don't try to use this disassembler for anything but Turbo Pascal code. Error recovery is predictably simple. The initial text byte is output as the operand of a DB pseudo-op and provision is made to resume work at the next byte of text. I hope this program is found to be useful in spite of the errors it must surely contain. I have yet to make much sense of the rules for MASM or TASM operand coding and I found very little of value in many of the so-called "texts" on the subject. I found myself in the position of that legendary American in England watching a Cricket match for the first time ("You mean it has RULES?"). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 51 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 9. UNIT LIBRARIES I have examined .TPL files in and conclude that their structure is trivial. It's so easy to handle them that the program now routinely examines either the TURBO.TPL or the TPW.TPL to resolve named types. 9.1 LIBRARY STRUCTURE A Turbo Pascal Library (.TPL) file is a simple catenation of Turbo Pascal Unit (.TPU) files. Since the size of a Unit may be determined from the Unit Header (see Section 4.2, Page 16), it is simple to see that one may "browse" through a .TPL file looking for an external unit such as SYSTEM.TPU. The supplied program does just that in its unit retrieval process so the TPUMOVER utility is no longer required for processing of units in either the TURBO.TPL or in the TPW.TPL file. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 52 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 10. INFERENCES DRAWN FROM ANALYSES I have learned much about Turbo Pascal .EXE files from poring over the output of the supplied program. It is possible to learn how to build smaller .EXE files after contemplating the structure of Unit files. It is also possible to avoid certain troublesome anomalies in the code if one can see just what Turbo Pascal does when certain switch declaratives are in effect. 10.1 LINKER GRANULARITY The Linker appears to be able to resolve any code or data fragment with a resolution that matches the granularity of the various "map" tables in the unit file. The Code Map, the CONST DSeg Map and the GLOBAL VAR Map each map things that can be included in the .EXE file if referenced. Conversely, these things can also be excluded if not referenced. Turbo Pascal manuals have been just a little vague about how "smart" the "Smart Linker" actually is but the granularity of the maps implies the extent of that "smartness". Assuming the linker does in fact take advantage of this information and act on it, then we as programmers can have a bit more control over the elements included from Unit Files. This control can extend to GLOBAL VAR's that may be used in particular circumstances, or not at all in others. It seems that CONST DSeg and GLOBAL VAR Map entries are constructed for each TYPED CONST or VAR "Declaration Part" encountered in the Pascal source code. Thus, "Toolbox" type units can have their Typed CONST's and GLOBAL VAR's partitioned along usage lines dedicated to a small group of Procedures or Functions so that they only get included if the appropriate Procedures or Functions are referenced or are explicitly referenced by the some external program. 10.2 FLOATING-POINT EMULATION Floating-Point emulation has some tricky cases -- particularly when the In-Line Assembler is used. As noted earlier, the implementation of Floating-Point Emulation is the responsibility of the SYSTEM unit in the MS-DOS version and of WIN87EM in the WINDOWS version. The state of the {$Gñ} directive toggle has an impact in these cases. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 53 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ It would appear that 80286 code generation changes the way that floating-point instructions are generated since the 80287 is implied as the co-processor chip. In this case, the programmer has fine control over the timing of WAIT instructions since 80287 instructions don't automatically get prefixed by WAIT ops. When 8087 code is being generated, these WAIT instructions are produced for 8087 instructions since the 8087 requires it. This doesn't happen when the code is targeted at the 80287. So far, so good. However, EMULATION of such code gets trickier. 10.2.1 VERSION 6.0 COMPILER FOR MS-DOS It seems that the {$Eñ} directive doesn't work like it did in previous versions. All code produced in 8087 mode seems to be emulated code. I haven't found a way to get 8087 code generated if the compiler runs on a machine that doesn't have a co-processor. It may be that the directive works as documented if a co-processor is available on the machine the compiler runs on. 10.2.2 VERSION 1.0 COMPILER FOR WINDOWS It seems that the WIN87EM DLL in WINDOWS either needs to be able to service 80287 code via Hardware Interrupt 07H, or the application needs to be able to adapt itself to missing co-processor situations. This is implied by the Emulation Fix-Ups discussed earlier. These fix-ups are produced when 8087 code is being generated since the WAIT prefix on an instruction provides space for loader patching. Since WAIT prefixes are not automatically produced for 80287 instructions (except for FWAIT), some other mechanism is needed. I don't know how this situation is handled unless WIN87EM also services Interrupt 07H. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 54 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 11. APPLICATION NOTES One of the more obvious applications of this information would seem to be in the area of a Cross-Reference Generator. There is a very fine example of such a program in the public domain that was written by Mr. R. N. Wisan called "PXL". This program has been around since the days of Turbo Pascal Version 1. The program has been continually enhanced by the author in the way of features and for support of the newer Turbo Pascal versions. It does not however solve the problem of telling one which unit contains the definition of a given symbol. In fairness to "PXL" however, this is no small problem since the format of .TPU files keeps changing (Turbo 6.0 Units are not object-code compatible with Turbo 5.x Units, and so on...) and Mr. Wisan probably has more than enough other projects to keep himself occupied. However, for the user who is willing to work a little (maybe a lot?), this document would seem to provide the information needed to add such a function to his own pet cross-reference generator. Further, with SIGNIFICANTLY more effort, it should be possible to do much of the job of de-compilation -- provided the DEBUG dictionary is present. At the very least, most declarations should be recoverable. It's another thing entirely to try to reconstruct plausable TURBO Pascal code from the CSegs. This would be a formidable task and lots of knowledge about TURBO's code generators would have to be acquired. At present, the only way I know to get this information is to have the run-time library source codes and then work-work-work at testing code produced by the compiler for a huge number of test case units. You have to want to do this really badly in order to invest the time. I am not that tired of living. Finally, code-tuning is not really so tedious an exercise as one might imagine. The disassembler makes it possible to experiment with many variants of specific source code at the unit level and to observe the effect on object code generated. With practice, there are certain coding practices one can avoid such as indescriminate use of the "WITH" statement in Pascal (generates extra pointers and stack usage). A really simple way of checking a code proposal is to create a small test unit and fill it with sample coding. Disassembly of that unit will show what code is produced. This can be a rewarding exercise! ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 55 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 12. ACKNOWLEDGEMENTS This project would have been totally infeasible without the aid of some very fine tools. As it was, several hundred man hours have been expended on it and as you can see, there are a few unresolved issues that have been (graciously) left for others to address. The tools used by this author consisted of: Turbo Pascal for Windows by Borland International Turbo Pascal 6.0 Professional by Borland International Microsoft WORD (version 5.5) LIST (version 7.5) by Vernon D. Buerg the DEBUG utility in MS-DOS Version 3.3. PARADOX 3.5 by Borland International QUATTRO PRO Version 2.0 by Borland International TURBO ASSEMBLER 2.0 by Borland International (PARADOX and QUATTRO PRO were used for data collection and analysis in the course of coding the recognizer tables for the disassembler unit.) The references listed were of great value in this project. [Intel85] was a valuable source of information about coprocessor instructions as well as offering hints about the differences between the 8086/8088 and the 80286. The [Borland] TASM manuals offered further info on the 80186. [Nelson] provided presentations of well-organized data directed at the problem of disassembly but the tables were flawed by a number of errors which crept into my databases and which caused much of the extra debugging effort. [Intel89] offered valuable insights on the 80386 addressing schemes as well as the 32-bit data extensions. Finally, [Brown] provided valuable clues on the Floating-Point emulators used by Borland (and Microsoft?). As you can see, the amount of hard information available to me on this project was quite limited since I am unaware of any other existing body of literature on this subject. Finally, I am grateful to Mr. Anders Hejlsberg (Borland's Principal Architect for TURBO PASCAL) for the time he spent discussing "cabbages and kings" with me. TURBO PASCAL owes much of its syntactic style and elegance to his efforts and good judgement. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 56 Inside TURBO Pascal Unit Files ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 13. REFERENCES [Borland], TURBO PASCAL FOR WINDOWS Programmer's Guide, Borland International, 1991. [Borland], TURBO ASSEMBLER REFERENCE GUIDE, Borland International, 1988. [Borland], TURBO ASSEMBLER USER'S GUIDE, Borland International, 1988. [Borland] TURBO PASCAL 6.0 PROGRAMMING GUIDE, Borland International, 1990. [Borland] TURBO PASCAL LIBRARY REFERENCE Version 6.0, Borland International, 1990. [Borland] TURBO PASCAL USER'S GUIDE Version 6.0, Borland International, 1990. [Brown], INTER191.ARC, Ralf Brown, 1991 [Intel85], iAPX 286 PROGRAMMER'S REFERENCE MANUAL INCLUDING THE iAPX 286 NUMERIC SUPPLEMENT, Intel Corporation, 1985, (order number 210498-003). [Intel89], 386 SX MICROPROCESSOR PROGRAMMER'S REFERENCE MANUAL, Intel Corporation, 1989, (order number 240331-001). [Nelson] THE 80386 BOOK: ASSEMBLY LANGUAGE PROGRAMMER'S GUIDE FOR THE 80386, Ross P. Nelson, Microsoft Press, 1988. [Scanlon], 80286 ASSEMBLY LANGUAGE ON MS-DOS COMPUTERS, Leo J. Scanlon, Brady 1986. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ June 6, 1991 Page 57 14. INDEX .OBJ file........14, 35, 37, 39 Hash.............13, 14, 15, 16, .RES file........39 17, 18, 19, 20, .TPL file........8, 16, 45, 46, 25, 30, 31, 48 52 .TPU Include..........39, 40 file...........7, 9, 13, 16, Interface........7, 13, 14, 15, 27, 45, 46, 52, 16, 17, 18, 19, 55 26 size.........16 Interrupt 07H....54 SYSTEM.........8, 18, 19, 21, 27, 49, 52 Library..........45 Locator {$Eñ}............54 LG.............9, 12, 21, 23, {$Gñ}............53 26, 27, 30, 31, 32, 33, 34 80286............54 LL.............9, 13, 18, 26, 80287............44, 54 35 80387............44 offset.........9, 11, 12, 22, 8087.............42, 44, 54 24, 25, 31, 35, 36, 40, 41, 42 Attribute ABSOLUTE.......9 Method...........24 EXTERNAL.......24, 35 CONSTRUCTOR....24 DESTRUCTOR.....24 Call Model Self...........22 ASSEMBLER......24 Dynamic........24 Operand offset...42 FAR............24 INLINE.........24 Parameter........20, 23, 25, 34 INTERRUPT......24 PROC.............7, 13, 14, 24, CONST............7, 13, 14, 15, 35, 36, 40, 42, 22, 31, 37, 40, 47 41, 42, 47 Constraint.......33, 34 RunError.........47 CSeg.............7, 13, 14, 35, 36, 37, 39, 40, SEGMENT..........42 41, 42, 47 Signature........7, 26 Stub.............9, 20, 23 Defining block...37, 38 sSxx...........24 Directive........14, 15, 16, 24, SYSTEM.TPS.......8, 19 35, 39, 40 DLL..............7, 13, 38, 42 TPW..............45, 52 DMT..............14, 15, 24, 31, TURBO............45, 52 37 Type Descriptor..21, 23, 26, 27, 28, 30, 31, 32, Emulation........53 33, 34, 49 Emulator.........42, 43 External.........9, 35, 37, 42, VAR..............38, 47 52 VMT..............14, 15, 25, 31, 37 FWAIT............54 WIN87EM..........42, 44, 53, 54 Granularity......53