SOFWIN LABORATORIES Super Scalar Technologies CORSPEED version 6.1 PC COMPUTING SPEED MEASUREMENT TOOL Sofwin Laboratories 613 Old Farm Road Columbus, Ohio 43213 Telephone: (614) 866-9966 BBS/FAX (614) 866-9960 CompuServe 74431,1071 This file updated through: May 29, 1994 Copyright 1994 Sofwin Laboratories ============================================================================ DESCRIPTION This high technology computer performance measurement tool is part of Sofwin Laboratories' new SuperScalar technology package. This is a full scale tool suitable for professional, engineering, and personal use in the assessment of PC Core Engine ( CPU <-> Memory ) performance. SHAREWARE REGISTRATION $29 While CORSPEED is being offered as shareware, it is neither free nor a toy. This is a fully functional, professional grade measurement tool. If you want to use this tool for more than 90 days, want to receive full professional documentation, or want to be included in Sofwin's Update Notification Program you need to register this software. Registration of this shareware version is only $29. Send your check in the amount of $29 to CORSPEED Registration, Sofwin Laboratories, 613 Old Farm Road, Columbus, Ohio 43213, or call (800) 339-2579 to use your credit card. PROFESSIONAL VERSION WITH FULL SOFWIN DOCUMENTATION $39 If you'd prefer to have the full professional ( serialized ) version of CORSPEED delivered to your door on 1.44 MB 3.5 floppy disk including both printable and browsable documentation for all current Sofwin measurement tools, the price is only $39. Ask for PROFESSIONAL CORSPEED PACKAGE. PRINTING INFORMATION This document has been prepared with form feeds embedded for easy printing on any standard printer. Including the header and footer, there are 55 lines per page. A simple DOS command may be used to print this file. C:>COPY CSP.DOC PRN CORSPEED DOCUMENTATION Page 1 Sofwin Laboratories SuperScalar Measurement Tools TABLE OF CONTENTS SECTION I -- OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . .4 TECHNOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 WHAT CORSPEED DOES. . . . . . . . . . . . . . . . . . . . . . . . . .5 HOW CORSPEED IS DIFFERENT . . . . . . . . . . . . . . . . . . . . . .5 PROCESSORS SUPPORTED. . . . . . . . . . . . . . . . . . . . . . . . .7 PROCESSOR FAMILY EXPANSION. . . . . . . . . . . . . . . . . . . . . .7 SECTION II -- OPERATING INFORMATION. . . . . . . . . . . . . . . . . . . .8 COMMAND LINE OPTIONS. . . . . . . . . . . . . . . . . . . . . . . . .8 KEYBOARD ASSIGNMENTS. . . . . . . . . . . . . . . . . . . . . . . . .9 FUNCTION KEYS. . . . . . . . . . . . . . . . . . . . . . . . . .9 SYSTEM CONFIGURATION. . . . . . . . . . . . . . . . . . . . . . . . .9 PROCESSOR INFORMATION . . . . . . . . . . . . . . . . . . . . . . . .9 MEMORY CONFIGURATION. . . . . . . . . . . . . . . . . . . . . . . . 10 SYSTEM INFORMATION. . . . . . . . . . . . . . . . . . . . . . . . . 10 FOR BEST RESULTS. . . . . . . . . . . . . . . . . . . . . . . . . . 10 LOW SPEED LIMITATIONS . . . . . . . . . . . . . . . . . . . . . . . 10 EFFECTIVE RUNNING SPEEDS. . . . . . . . . . . . . . . . . . . . . . 11 MEMORY WINDOWS. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 UNDERSTANDING EFFECTIVE DATA WIDTH. . . . . . . . . . . . . . . . . 12 HIGH PERFORMANCE MEMORY SYSTEMS . . . . . . . . . . . . . . . . . . 12 HOW CORSPEED DIFFERS FROM BENCHMARKS. . . . . . . . . . . . . . . . 13 HOW TO ESTIMATE HOW FAST MULTIPLE TASKS WILL RUN. . . . . . . . . . 13 CORE ENGINE PERFORMANCE IS NOT THE WHOLE STORY. . . . . . . . . . . 14 THE IMPORTANCE OF SUPERSCALAR TECHNOLOGY. . . . . . . . . . . . . . 14 SOFWIN FORUM MESSAGE SUPPORT. . . . . . . . . . . . . . . . . . . . 15 FILE NAMES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 RECORDING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 USING THE SOFWIN COMMENTS EDITOR . . . . . . . . . . . . . . . 15 KEYBOARD . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 PRINTER OUTPUT. . . . . . . . . . . . . . . . . . . . . . . . . . . 16 SECTION III -- HOW VERSION 6.1 DIFFERS FROM 6.0x releases. . . . . . . . 17 CHANGES TO EXISTING FEATURES. . . . . . . . . . . . . . . . . . . . 17 NEW FEATURES. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 EFFECTIVE READ-WRITE DATA WIDTH DISPLAY. . . . . . . . . . . . 17 UNDERSTANDING EFFECTIVE DATA WIDTH . . . . . . . . . . . . . . 17 HISTOGRAM COLOR SCHEME. . . . . . . . . . . . . . . . . . 18 SOFWIN FORUM MESSAGE SUPPORT. . . . . . . . . . . . . . . . . . . . 18 FORUM RECORDING PROCEDURES . . . . . . . . . . . . . . . . . . 18 RECORDING. . . . . . . . . . . . . . . . . . . . . . . . . . . 18 COMMENT MANAGEMENT . . . . . . . . . . . . . . . . . . . . . . 19 SAMPLE FORUM RECORDING. . . . . . . . . . . . . . . . . . . . . . . 19 HOW TO COPY CORSPEED RECORDING TO A FORUM MESSAGE . . . . . . . . . 20 USING COMPUSERVE'S WINCIM. . . . . . . . . . . . . . . . . . . 20 USING A FORUM NAVIGATOR . . . . . . . . . . . . . . . . . . . 20 CORSPEED DOCUMENTATION Page 2 Sofwin Laboratories SuperScalar Measurement Tools SECTION IV -- OPERATING PRINCIPLES . . . . . . . . . . . . . . . . . . . 21 MEMORY CONFIGURATION. . . . . . . . . . . . . . . . . . . . . . . . 21 SYSTEM INFORMATION. . . . . . . . . . . . . . . . . . . . . . . . . 21 HIGH PERFORMANCE MEMORY SYSTEMS . . . . . . . . . . . . . . . . . . 22 HOW TO ESTIMATE HOW FAST MULTIPLE TASKS WILL RUN . . . . . . . . . 22 CORE ENGINE PERFORMANCE IS NOT THE WHOLE STORY. . . . . . . . . . . 23 THE IMPORTANCE OF SUPERSCALAR TECHNOLOGY. . . . . . . . . . . . . . 23 PROCESSOR FAMILIES. . . . . . . . . . . . . . . . . . . . . . . . . 24 SECTION V -- ABOUT SOFWIN LABORATORIES . . . . . . . . . . . . . . . . . 25 WANT TO KNOW MORE?. . . . . . . . . . . . . . . . . . . . . . . . . 25 SOFWIN REPORTS . . . . . . . . . . . . . . . . . . . . . . . . 25 TOOL CATALOG . . . . . . . . . . . . . . . . . . . . . . . . . 25 SECTION VI -- SHAREWARE REGISTRATION . . . . . . . . . . . . . . . . . . 26 CORSPEED DOCUMENTATION Page 3 Sofwin Laboratories SuperScalar Measurement Tools SECTION I -- OVERVIEW This high technology computer performance measurement tool is part of Sofwin Laboratories' new SuperScalar technology package. This is a full scale tool suitable for professional, engineering, and personal use in the assessment of PC Core Engine ( CPU <-> Memory ) performance. TECHNOLOGY Recent advances in the computer industry which are applicable to processor and/or memory system technology are increasingly incompatible with prior art in this field. While the future of PC technology, as it combines with and merges with video, mini-computer technology, distributed processing, and high powered engineering workstations, lies in diverse processor and memory technologies ( such as the DEC Alpha, MIPS 6000, Cyrix M5, NexGen 586 and other processors ). There is no backwards compatibility from Intel's x86 architecture, for example, to Digital new Alpha processor -- or for that matter to the PowerPC, or MIPS RISC processors. New technology is required which is not only capable of assessing hardware performance on the new processor families, but more importantly which can make direct comparisons between systems possible. Where single thread, inline, segmented, real memory operations were once the only game in town in the PC business, the introduction of new processor technologies, including Intel's PENTIUM goes beyond the pipelined abilities of the 486 series processors. The new processor families are not all x86 compatible. While most of the new processors, including Intel's Pentium are fully x86 compatible in one operating mode, many provide one or more additional modes which may be not x86 compatible. The Pentium, for example, can make better use of its superscalar capabilities when running recompiled code that is optimized for dual pipeline execution. So, while the native operating mode of Cyrix' new M5 is x86 based, other processor families such as the PowerPC run x86 instructions in compatibility modes which are not indemic to the x86 instruction set. Tools which are to work across multiple processor families and operating system environments must meet stringent operating requirements that assure compatibility and consistency from one processor family to another. As part of Sofwin Laboratories completely new SuperScalar technology, CORSPEED is at once a continuation of the past as well as a major shift in direction to focus on the future. Superscaler, multi pipelined instruction execution is a new ballgame and CORSPEED is designed, wherever there is any conflict, to adhere to the emerging standards. CORSPEED DOCUMENTATION Page 4 Sofwin Laboratories SuperScalar Measurement Tools WHAT CORSPEED DOES The performance of variable logic memory systems ( cached memory ) degrades as the magnitude of active memory addresses is increased. Variable logic memory performance drops off as the active address area increases due to the increasing need for lower memory information to be located and copied into higher levels of cache. While some PC systems are efficient, capable of uploading a cache line in as little as 6 clock cycles at 33 MHz, other designs may require as many as 46 clock cycles to upload one 16 byte line of data or instructions. Since the processor is halted while cache housekeeping is underway, the system that can upload memory into the cache in 6 cycles is going to appear to be running faster than one that takes 46 clocks to do the same work. CORSPEED makes comparing PC system performance straightforward and easy by computing the EFFECTIVE RUNNING speed you can expect in any of four major operating system environments. Since memory footprint ( the extent of active memory area required by an application ) size, the number of running tasks, and even the memory dispersion characteristics of the operating system directly impact memory - cache management, it's useful to know how applications of different sizes will be impacted by the core engine performance curves. The fundamental principle behind CORSPEED is Effective Running Speed. A 50 MHz 486 that never has a cache miss ( hence no cache updating or housekeeping ) runs your applications at its full 50 MHz speed. But when cache misses increase things can start to slow down very quickly. For example a 50 MHz 486 that is stopped 50% of the time for cache system housekeeping may be running at 50 MHz while it is running, but since it is stopped half the time, its effective running speed is only 25 MHz. HOW CORSPEED IS DIFFERENT CORSPEED is vastly different in technology, architecture, operation and appearance than prior Sofwin measurement tools. Here's how: 1) Sofwin's new SuperScalar, parallel processing technology can measure and evaluate single, dual and quad pipelines in a single processor. This development was required for Pentium ( dual pipeline capability ) qualification as well as to permit forward technology development to radically different processors such as the DEC Alpha, MIPS 6000, Cyrix M5 series, and others. CORSPEED directly monitors the performance impact the Pentium's U and the V pipes, where the U pipe can execute all instructions, and the V pipe is limited to simple integer and floating point instructions. CORSPEED is required to recognize conflicts between these two unequal channels, so new in-line assembly code was required to assure control of which pipe is executing and whether the V pipe is held off due to sequencing conflict(s). CORSPEED DOCUMENTATION Page 5 Sofwin Laboratories SuperScalar Measurement Tools Sofwin's SP/ESP technology can measure, track and analyze processor operations which are pipelined and simultaneously executed by the processor, regardless of whether they are in-sequence or out-sequence at time of execution. Where required, SP/ESP modifies the binary instruction set to measure operations as fully Pentium optimized. SP/ESP measurement technology is backwards compatible with prior versions of the x86 instruction set only to the 486 series. Support for all prior versions of the x86 architecture is incompatible with SP/ESP and measurement of those systems has been abandoned in this product. SP/ESP has also been modified to properly handle simultaneous floating point operations, whether in parallel with integer operations, or in parallel with compatible floating point operations. While simultaneous executions may require recompilation of applications code, CORSPEED is now fully qualified for multiple simultaneous operations/measurements. This implies that CORSPEED's integer and floating point measurements represent the execution timing of fully qualified PENTIUM binaries. Standard, integer and floating point code, ( not PENTIUM optimized ) will execute slower, but the exact degree to which it executes slower is determined by the random placement of compatible and incompatible opcodes in the data stream. CORSPEED cannot predict random, unqualified opcode performance levels, so keep in mind, that when measuring PENTIUM based systems, the performance indicators are for PENTIUM optimized binaries. 2) SP/ESP is configured for multiple cache facilities, where the number of parallel caching systems may be 1, 2, or 4. The Pentium uses only the dual cache facility with its parallel 8 kb data and 8 kb opcode cache units. Cache line sizes have been widened for 16 bits in prior versions of CORSPEED to 64 bits in this version. Cache line size limits have been increased from 16 bytes in prior versions to 128 bytes in this version. 3) State machine level analysis confirms that the resulting opcode stream generated by CORSPEED fully complies with Intel's Branch Prediction rules so that no branch events ( in the primitive routines ) are unaccounted for at runtime. CORSPEED DOCUMENTATION Page 6 Sofwin Laboratories SuperScalar Measurement Tools PROCESSORS SUPPORTED IN THIS VERSION INTEL 486sx, 486dx, 486dx2, Intel4DX, Pentium, 486SL, etc. IBM 486sx, 486dx, 486dx2, 486dx3, etc. AMD 486sx, 486dx, 486dx2, etc Cyrix Cx486SLC, Cx486DLC, Cx486SL2, Cx486DL2, Cx486SR, Cx486DR, Cx486SR2, Cx486DR2, Cx486S, Cx486S2, Cx486SE, Cx486S2E, Cx486DX, Cx486DX2, CxM1 PROCESSOR FAMILY EXPANSION As demand warrants, Sofwin tool processor support will be expanded to new processor families. Here are the processor families currently being considered for support: NexGen 586 series PowerPC 601, 602, 620 Digital Equip Alpha series MIPS 6000 series CORSPEED DOCUMENTATION Page 7 Sofwin Laboratories SuperScalar Measurement Tools SECTION II -- OPERATING INFORMATION CORSPEED is a high performance performance assessment tool. As a result, CORSPEED MAY NOT OPERATE PROPERLY ON LOW PERFORMANCE PC SYSTEMS. In general, use of this tool should be restricted to High Performance PC systems running at 33 Mhz and higher that are equipped with floating point processors. Equipment who's performance falls below the design operating envelope or which is not equipped with a high speed floating point coprocessor facility may cause unexpected program termination. COMMAND LINE OPTIONS CORSPEED [ opt1 ] [ optn ] < ENTER > Where: -b Forces the automatic color display into black and white mode. -c xxx Specifies L2 cache size from 32 kb to real memory limit [ in kb ]. Use this option if the L2 cache is larger than 256 kb. -e Use this option if high performance memory such as EDRAM is installed, or if you want to simulate how it would affect system performance. -t xx Use this option if you want the results scaled to a per task basis. -m xxxx Use this option if main memory is larger than 64 mbytes. -s Enables the optional sound effects system. -o Disables odometer display feature. -h Invokes this help display directly from the command line. -$ xx.xx Use this to input cost for cost/performance computation. CORSPEED DOCUMENTATION Page 8 Sofwin Laboratories SuperScalar Measurement Tools KEYBOARD ASSIGNMENTS FUNCTION KEYS F1 invokes HELP system. SHIFT + F1 Displays serial number and licensee information F3 Records measurement data in Sofwin Forum format F4 Create or Edit Forum record comments F7 Prints screen image to LPT1. F8 Appends screen image to CORSPEED.SCR file. ESC Quit CORSPEED and return to DOS. ALT + S Toggle the SOUND and VIDEO EFFECTS on/off. ALT + X Unconditional return to DOS. SYSTEM CONFIGURATION The boxed area at the top of the screen is filled in by CORSPEED during hardware and software analysis operations. The three system configuration areas which are critical to proper operation of CORSPEED are Processor, Memory, and operating system. PROCESSOR INFORMATION Processor type and extension give you important information about potential computing capacity. While CORSPEED will show about the same level of Core Engine performance for either SX or DX class processors, DX processors are required for optimum floating point performance in graphics, drawing, and spreadsheet applications. Use the CLOCK information to confirm that the PC is operating at the proper speed. While CORSPEED will operate in all proces- sor modes, REAL mode generally provides the most accurate measurements. If the processor is doubly or triply clocked, a number may appear at the end of the processor type. A '2' means double clocking, etc. CORSPEED DOCUMENTATION Page 9 Sofwin Laboratories SuperScalar Measurement Tools MEMORY CONFIGURATION The memory cache levels used in 486 based PCs are numbered L1 and L2, according to their proximity to the processor. The on-chip cache is called L1, while a secondary cache [ logically residing between the processor and main memory ] is called L2. CORSPEED measures and analyzes the organization of each cache level and its size is displayed in kilobytes. Cache size, speed, and organization are instrumental in determining program operating speeds. In general, memory caching systems provide the biggest benefit when program code and data are frequently accessed from the cache. That's because the CPU must wait whenever new data is uploaded from main memory -- thereby yielding some of its speed advantage. In general, the more the software's memory address span exceeds cache size, the less effective the cache. For complex, multi-tasking operating systems, or very large memory spans, cache miss penalties can effectively cancel some or all of the benefits of cache memory. SYSTEM INFORMATION Accurate performance measurement requires that there be nothing else going on that might interfere with the measurement routines in CORSPEED. CORSPEED derives its information by directly manipulating all of the principal hardware systems. To do its job, CORSPEED must be operated at the equivalent of Intel's PL-0. While operating at the O/S kernel level, and actively interleaving with and preempting DOS interrupts, it is possible for the measurement primitives to accurately measure hardware operations at the electrical event level. FOR BEST RESULTS USE THIS TOOL ONLY ON DOS BASED Pcs CORSPEED is certified to operate with standard MSDOS, versions 3.1 and up. Measurement activity will be adversely impacted in any operating environment that preempts PL-0, loads CORSPEED at an unpredictable address, or interferes with interrupt management. This means that CORSPEED will not provide satis- factory results under other operating systems including Windows [any version] or OS/2. LOW SPEED LIMITATIONS CORSPEED MAY NOT OPERATE PROPERLY ON LOW PERFORMANCE PC SYSTEMS. In general, use of this tool should be restricted to High Performance PC systems running at 33 Mhz and higher that are equipped with floating point processors. Equipment who's performance falls below the design operating envelope or which is not equipped with a high speed floating point coprocessor facility may cause unexpected program termination. CORSPEED DOCUMENTATION Page 10 Sofwin Laboratories SuperScalar Measurement Tools EFFECTIVE RUNNING SPEEDS Your CPU runs at its advertised clocking rate except when it is stopped to wait for cache system updates, line fills, or main memory operations. The percentage of time the processor is stopped directly impacts how fast any application can execute. For example, a 50 MHz 486 processor stopped 50% of the time by memory system delays will execute a program at about the same speed as a 25 MHz 486 which never has to wait for memory or cache updates. The more often, and longer, a cpu has to wait, the lower its Effective Running Speed. As cache size is decreased, or program memory requirements increase, the percentage of time the cpu is stopped increases. Effective Running Speeds are easy to understand. When comparing two systems,the one with the higher number will execute programs proportionately faster. Since the complexity of the application and operating system play a deterministic role in how computing resources are utilized, software which makes lesser demands on the hardware will execute faster. The more complicated the application and operating systems, and the larger their appetite for memory addresses, the higher the demand on hardware resources. CORSPEED plots the software demands [ using an address range model called the memory WINDOW ]against the operating curve of the core engine to determine effective speed. MEMORY WINDOWS The aggregate addresses required by applications software is comprised of the addresses where the instructions [ operating code ] and data reside. In general the larger and more complex the application the larger the spatial locality of code and data. Software with very limited spatial locality has a smaller memory window than software with large, complex, memory and code structures. Memory windows are deterministic in their impact on overall program running speed [ see Memory Configuration panel for explanation ]. For example, a program who's code and data fit entirely within the confines of an L2 cache can operate faster than one who's spatial memory requirements are 10, or 100 times larger. For a perfect PC, i.e. one with an infinite cache which never requires updating, all programs would run at the same speed [ the processor's clocking rate ]. Since cache system performance drops off sharply as memory demands increase, the memory window of each application directly impacts execution speed. CORSPEED DOCUMENTATION Page 11 Sofwin Laboratories SuperScalar Measurement Tools UNDERSTANDING EFFECTIVE DATA WIDTH One of the most important qualities of memory system design is how well it accommodates data and instruction transfers between processor and memory. While the electrical path is a constant, i.e. 32 bits for 486 and Pentium PC systems, the effective working width of the memory system is sometimes constricted by cache management mechanics. When the time required to access 32 bit data is about the same as it is for 16 bit data it means that the effective data width is a full 32 bits. However, if it takes twice as long to access 32 bit data ( indicating that each 32 bit element has to be moved in two 16 bit tranches ) the effective data width is only 16 bits. CORSPEED analyzes both read and write memory access times and computes the effective data width for each in bits. The data width pictograms show the effective width of both the read and write channels. Read/Write pictograms are colored RED when the width is 20 bits or less, YELLOW for 21 to 27 bits, CYAN 28 - 30, and GREEN whenever the width is 31 bits or wider. HIGH PERFORMANCE MEMORY SYSTEMS CORSPEED can analyze core engine performance for both traditional dram based PC systems as well as the new high performance memory systems such as EDRAM. To compute core engine operating speeds for high performance memory systems, use the command line /e option. While this option is intended to provide improved accuracy for high performance memory systems, it is also useful for estimating the relative impact high performance memory would have on ordinary dram based PC systems. High Performance memory systems make it possible for main memory systems to perform as if they were one huge L2 cache. The performance benefits of extremely fast read-write capabilities are only part of the story, since HPM also eliminates processor stoppages due to L2 cache management as well. High performance memory gains its advantage by operating at nearly the same speed as the processor -- completely eliminating the overhead associated with L2 cache loading and unloading. High Performance memory systems will be increasingly important as CPU data bandwidth demands continue to grow. CORSPEED DOCUMENTATION Page 12 Sofwin Laboratories SuperScalar Measurement Tools HOW CORSPEED DIFFERS FROM BENCHMARKS This Core Engine version of Sofwin Laboratories' new SuperScalar measurement technology is the most powerful tool of its kind ever developed. CORSPEED extends 10 years of PC performance measurement tool development into multiprocessor families, multi-tasking, and 32 bit operating systems. CORSPEED is a fully Pentium qualified, state-of-the-art SuperScalar measurement tool that reports how fast the core engine system of a PC runs in each of the four principal operating system environments. CORSPEED is also a multiple task emulator and high performance memory estimator -- making CORSPEED, the most powerful, accurate, and useful performance assessment tool ever made available to the general public. But that's only the technical side -- CORSPEED can even compute the cost per megahertz of achieved performance for wide range of situations ranging from a single task DOS application to multiple tasks running under true 32 bit environments such as OS/2 and Win32. Here's a tool that Tells It Like It Is. HOW TO ESTIMATE HOW FAST MULTIPLE TASKS WILL RUN One of the most useful aspects of CORSPEED is its ability to accurately project the effective running speed for one or several tasks or applications running concurrently. To see how multiple tasks will impact your computer's performance use the /t xx option where xx is the total number of concurrently running tasks or applications. CORSPEED defaults to 1 task, but it can project the impact of up to 16 applications running under OS/2 or Win32. While single foreground tasks are more typical of DOS and Win3 operations, CORSPEED projects the impact additional tasks would have in existing 16 bit operating system environments as well. Many will be surprised to learn that when two tasks are operating, the combined operating speed of both is less than that of a single task. The reason is that multiple tasks do more than share the processor time between them. Cache churning, time slice administration, and out of sequence interrupt processing consume increasing amounts of raw computing power as more tasks are activated. Increased memory resources, especially if its high performance memory, is particularly important in multi-tasking situations. CORSPEED DOCUMENTATION Page 13 Sofwin Laboratories SuperScalar Measurement Tools CORE ENGINE PERFORMANCE IS NOT THE WHOLE STORY Computer system performance is the result of many complex variables. As important as processor type, clocking rate, cache facilities, and memory size may be, the two other principal computing theaters, video/graphics and mass storage [ disk ], are also critical to how fast programs will execute. The reason is, that for the moment at least, the video channel is typically only 20% as fast as typical core engines, while the disk channel is very often less than 5% as fast. Since video/graphics and disk access are increasingly critical to applications programs -- due to the popularity of graphics and DLL's -- very fast core engine systems can be dramatically slowed by poor performance outside of the core engine [ processor, cache, and memory ]. For example, as the memory resource demands of large applications, or multiple tasks increase, so does the probability and frequency of virtual memory swapping. Sofwin Laboratories Professional Tools, such as PCPOWER, have built in Expert Systems that can analyze the performance of every hardware system -- making it possible to project how the target computer will perform under different loads and operating system environments. THE IMPORTANCE OF SUPERSCALAR TECHNOLOGY 1994 marks the beginning of the SuperScalar revolution. Beginning with the Intel Pentium, the SuperScalar age is upon us. No longer will CPUs be limited to executing a single instruction at a time. In the future, we'll have a choice between a wide range of SuperScalar processors from Digital Equipment, IBM-Apple-Motorola, MIPS, Cyrix and others. The day of the simple, one-thing-at-a-time CPU is quickly fading. SuperScalar technology, at first dual pipelined -- but soon perhaps offering many pipes -- is radically different -- requiring new ways of looking at and evaluating computing system performance. CORSPEED is totally SuperScalar in its design and operation -- capable of instigating, managing and measuring many different operations at once. This version, for example, not only supports all existing 486 processor designs, but a wide range of processors from Cyrix, AMD, IBM, and other x86 compatible CPU families. Sofwin's Professional tools are also fully SuperScalar -- so we'll be able to add new SuperScalar processor families such the ALPHA, MIPS and PowerPC processors in the months to come. CORSPEED DOCUMENTATION Page 14 Sofwin Laboratories SuperScalar Measurement Tools SOFWIN FORUM MESSAGE SUPPORT CORSPEED is designed to record performance information in a format suitable for CompuServe forum messages. If you don't have a CompuServe account from which you can access the Sofwin Forum area, call CompuServe toll-free at ( 800 ) 524-3388 to enroll. Ask for Representative # 593 to sign up! FILE NAMES CORSPEED names forum records using the following system: CSP$mdd#.TXT, Where: m = month ( from 0 -> C ), dd = day of month ( from 01 -> 31 ) # = daily index code ( from 0 -> z ). RECORDING To record PC performance in a forum compatible format, press F3. This will generate a sequentially numbered record suitable for use with CIS interfacing software such as CompuServe's popular WinCim. To capture the data OPEN the recording file, block the entire table and copy it to a forum or email message. USING THE SOFWIN COMMENTS EDITOR The SOFWIN line editor permits the input editing or deleting of information using the keyboard and the mouse. Cursor manipulation and keyboard input is handled in the customary DOS ways. KEYBOARD Arrow Keys Non destructive positioning of the edit cursor. HOME KEY Moves cursor to first character on the line. END KEY Moves cursor to the last character on the line. INS KEY Toggles between INSERT and TYPEOVER input mode. ESCAPE KEY Terminates edit and restores original contents. ENTER KEY Terminates edit, prompts for approval if any changes. DEL KEY Deletes the character at cursor position. BACKSPACE Destructive cursor move over previous character. CORSPEED DOCUMENTATION Page 15 Sofwin Laboratories SuperScalar Measurement Tools PRINTER OUTPUT There are several ways to cause the screen to be printed to the LPT1 device. From the keyboard either F7 or the PrtSc key may be pressed. All screens may be printed as a group with the SHIFT F7 key. If you have not used the printer during the current session, CORSPEED will ask you to verify that the printer is on-line and ready or work. You may then select the character set supported by your printer. For printers supporting the IBM PC character set select the PC8 option. Press the 'P' key to select PC8 character set output. If your printer does not support the complete PC character set select the ASCII option. Press the 'A' key to select the ASCII character set. This will translate those characters above 127 to alternate characters for better looking results. Press either the 'C' key or the ESC key if the printer is not ready or you decide not to print the current screen. CORSPEED DOCUMENTATION Page16 Sofwin Laboratories SuperScalar Measurement Tools SECTION III -- HOW VERSION 6.1 DIFFERS FROM 6.0x releases This version of CORSPEED was developed specifically for the Sofwin Forum on CompuServe. While it is operationally identical to previous versions, several new features and capabilities have been added. CHANGES TO EXISTING FEATURES More than 100 changes have been made to correct operational problems or bugs. Some of the more obvious changes include: The command line separator '=' has been eliminated. The quick escape and termination facilities have been improved. The processor clock multiple, ROM remap and memory width annunciators have been eliminated to reduce user confusion when used on differing processor families. The image storage ( to file ) system has been updated to include date and time of screen image storage. Display anomalies when run in black and white mode have been corrected. NEW FEATURES EFFECTIVE READ-WRITE DATA WIDTH DISPLAY This version of CORSPEED displays histograms which represent the effective reading and writing data width as seen from the processor's point of view. This information is derived from the average read and write to memory access times. UNDERSTANDING EFFECTIVE DATA WIDTH One of the most important qualities of memory system design is how well it accommodates data and instruction transfers between processor and memory. While the electrical path remains constant, i.e. 32 bits for 486 and Pentium PC systems, the effective working width of the memory system can appear to be constricted by imbalance between processor and memory system bandwidth or cache management mechanics. When the time required to access 32 bit data is about the same as it is for 16 bit data it means that the effective data width is a full 32 bits. However, if it takes twice as long to access 32 bit data ( indicating that each 32 bit element has to be moved in two 16 bit tranches ) then the effective data width is only 16 bits. CORSPEED DOCUMENTATION Page17 Sofwin Laboratories SuperScalar Measurement Tools HISTOGRAM COLOR SCHEME CORSPEED analyzes both read and write memory access times and computes the effective data width for each in bits. The data width pictograms show the effective width of both the read and write channels. Read/Write pictograms are colored RED when the width is 20 bits or less, YELLOW for 21 to 27 bits, CYAN 28 - 30, and GREEN whenever the width is 31 bits or wider. SOFWIN FORUM MESSAGE SUPPORT Sofwin Forum members found the screen image display generated by previous versions of CORSPEED to be difficult to capture and format within the confines of the CompuServe messaging and posting system. To overcome this problem, this version of CORSPEED includes two entirely new functions, forum message generation and forum comment capture. These features have been assigned the following function key assignments: F3 Use this key to record performance information in a file format suitable for importations into a CompuServe email or forum message structure. F4 Use this key to define a comment line that describes the system being evaluated. This comment line is automatically included in all forum recording files. FORUM RECORDING PROCEDURES CORSPEED is designed to record performance information in a format suitable for CompuServe forum messages. If you don't have a CompuServe account from which you can access the Sofwin Forum area, call CompuServe toll-free at ( 800 ) 524-3388 to enroll. Ask for Representative # 593 to sign up! FILE NAMES: Forum recording files are written to the default direction from which CORSPEED was run. Files are named in accordance with the following system: CSP$mdd#.TXT, Where: m = month ( from 0 -> C ), dd = day of month ( from 01 -> 31 ) # = daily index code ( from 0 -> z ). RECORDING To record PC performance in a forum compatible format, press F3. This will generate a sequentially numbered record suitable for use with CIS interfacing software such as CompuServe's popular WinCim. To capture the data OPEN the recording file, block the entire table and copy it to a forum or email message. CORSPEED DOCUMENTATION Page18 Sofwin Laboratories SuperScalar Measurement Tools COMMENT MANAGEMENT It is particularly helpful, when several recordings are made or stored together, to be able to identify which recording goes with which equipment. To make certain that the operator is given an opportunity to enter comments about each system, whenever the forum recording key ( F3 ) is pressed, CORSPEED checks to see if any operator comments have been defined. If comments already exist they are automatically inserted in the recording file. If no comments have been made by the operator, CORSPEED automatically opens the comments input box and waits for the operator to make a note about the system being recorded. For information on using the comments line editor see USING THE SOFWIN COMMENTS EDITOR section. SAMPLE FORUM RECORDING |-----------------------------------------------------------| | | | THE SOFWIN FORUM | | CORSPEED PERFORMANCE RECORD | | Recorded Sunday May 29, 1994 @ 10:21:48 | | | |-----------------------------------------------------------| | | | CPU FAMILY | CPU MODEL | CLOCK SPEED | CPU MODE | | Intel+ | 486DX | 33.01 | VIRTUAL | | | |L1 CACHE SIZE |L2 CACHE SIZE | MEMORY SIZE | ROM REMAP | | 8 | 64 | 16.0 | OK | | | | O/S SYSTEM | TASK COUNT |EXTENSION BUS | CSP VERSION | |MSDOS 6.2 | 1 | ISA | 6.1 | | | | - - - - - - DOS - - - - - - | - - - - Windows 3.x - - - - | |SPEED < MHz > | CLOCK % |SPEED < MHz > | CLOCK % | | 31.64 | 95.85 | 31.37 | 95.02 | | 28.90 | 87.56 | 24.80 | 75.12 | | 20.69 | 62.68 | 16.75 | 50.74 | | | | - - - - - OS/2 - - - - - - | - - - - - Win32 - - - - - | |SPEED < MHz > | CLOCK % |SPEED < MHz > | CLOCK % | | 27.67 | 83.83 | 19.60 | 59.36 | | 20.55 | 62.27 | 14.04 | 42.53 | | 15.39 | 46.63 | 14.04 | 42.53 | | | | LOCAL BUS | CPU REGISTER |CPU -> MEMORY | MEMORY -> CPU| | 32 | 32 | 32 | 22 | | | |-----------------------------------------------------------| |This is an AMI 486DX-33 system with 64 k of L2 cache | |-----------------------------------------------------------| CORSPEED DOCUMENTATION Page19 Sofwin Laboratories SuperScalar Measurement Tools HOW TO COPY CORSPEED RECORDING TO A FORUM MESSAGE USING COMPUSERVE'S WINCIM: Select FILE then OPEN from the WinCim control panel to invoke the text editor. Select the drive and directory where CORSPEED resides. All files with the .TXT extension will appear in the file selector box. CORSPEED forum recordings can be easily identified from their naming format: CSP$xxxx.txt Select the desired forum recording file to load it into the text editor. Pull down the EDIT menu and click on the SELECT ALL option to mark the forum recording text. Click on COPY then close the Notepad editor and return to WinCim. Create a forum message in the normal way and when you reach the point where you want to insert the CORSPEED forum recording text, select EDIT and click on COPY. This will insert the CORSPEED data into your forum message. USING A FORUM NAVIGATOR Given the proliferation of CIS Auto Navigator programs, it is not possible to define a specific procedure for importing the CORSPEED data recording into a forum message. In general, block move and file read operations typical of most editors can be used to transfer the text. It should be noted, however, that the space which occupies column 1 of the forum recording file is there for a purpose and should be included when placing this text in a forum message. Without the space at the beginning of each line of the recording, unless the "Unformatted" method selected, CIS will attempt to reformat the message causing it to be unreadable. CORSPEED DOCUMENTATION Page20 Sofwin Laboratories SuperScalar Measurement Tools SECTION IV -- OPERATING PRINCIPLES MEMORY CONFIGURATION The memory cache levels used in 486 based PCs are numbered L1 and L2, according to their proximity to the processor. The on-chip cache is called L1, while a secondary cache [ logically residing between the processor and main memory ] is called L2. CORSPEED measures and reports on the organization of each cache level. Cache size is displayed in kilobytes. Cache size, speed, and organization are instrumental in determining program operating speeds. In general, memory caching systems provide the biggest benefit when program code and data are frequently accessed from the cache. That's because the CPU must wait when- ever new data is uploaded from main memory -- thereby yielding some of its speed advantage. In general, the more the software's memory address span exceeds cache size, the less effective the cache. For complex, multi-tasking operating systems, or very large memory spans, cache miss penalties can effectively cancel some or all of the benefits of cache memory. SYSTEM INFORMATION Accurate performance measurement requires that there be nothing else going on that might interfere with the measurement routines in CORSPEED. CORSPEED derives its information by directly manipulating all of the core-engine hardware systems. To do its job, CORSPEED must be operated at the equivalent of Intel's PL-0. While operating at the O/S kernel level, and actively interleaving with and preempting DOS interrupts, it is possible for measurement primitives to accurately measure hardware operations at the electrical event level. FOR BEST RESULTS USE THIS TOOL ONLY ON DOS BASED PCs. CORSPEED is certified to operate with standard MSDOS, versions 3.1 and up. Measurement activity will be adversely impacted in any operating environment that preempts PL-0, loads CORSPEED at an unpredictable address, or interferes with interrupt management. This means that CORSPEED will not provide satisfactory results under other operating systems including Windows [ any version ], or OS/2. CORSPEED DOCUMENTATION Page 21 Sofwin Laboratories SuperScalar Measurement Tools HIGH PERFORMANCE MEMORY SYSTEMS CORSPEED can analyze core engine performance for both traditional dram based PC systems as well as simulate the new high performance memory systems such as EDRAM. To simulate core engine operating speeds for high performance memory systems, use the command line /e option. High Performance memory systems make it possible for main memory systems to perform as if they were one huge L2 cache. The performance benefits of extremely fast read-write capabilities are only part of the story, since HPM also eliminates processor stoppages due to L2 cache management as well. High performance memory gains its advantage by operating at nearly the same speed as the processor -- completely eliminating the overhead associated with L2 cache loading and unloading. High Performance memory systems will be increasingly important as CPU data bandwidth demands continue to grow. NOTE: NOT ALL PC SYSTEM DESIGNS ARE EQUAL -- INCLUDING HIGH PERFORMANCE DESIGNS. CORSPEED's simulation is no guarantee that any PC system fully implements all of the performance benefits of EDRAM or other memory systems. HOW TO ESTIMATE HOW FAST MULTIPLE TASKS WILL RUN One of the most useful aspects of CORSPEED is its ability to accurately project the effective running speed for one or several tasks or applications running concurrently. To see how multiple tasks will impact your computer's performance use the /t=xx option where xx is the total number of concurrently running tasks or applications. CORSPEED defaults to 1 task, but it can project the impact of up to 16 applications running under OS/2 or Win32. While single foreground tasks are more typical of DOS and Win3 operations, CORSPEED projects the impact additional tasks would have in existing 16 bit operating system environments as well. Many will be surprised to learn that when two tasks are operating, the combined operating speed of both is less than that of a single task. The reason is that multiple tasks do more than share the processor time between , them. Cache churning, time slice administration, and out of sequence interrupt processing consume increasing amounts of raw computing power as more tasks are activated. Increased memory resources, especially if it's high performance memory, is particularly important in multi-tasking situations. CORSPEED DOCUMENTATION Page 22 Sofwin Laboratories SuperScalar Measurement Tools CORE ENGINE PERFORMANCE IS NOT THE WHOLE STORY Computer system performance is the result of many complex variables. As important as processor type, clocking rate, cache facilities, and memory size may be, the two other principal computing theaters, video/graphics and mass storage [ disk ], are also critical to how fast programs will execute. The reason is, that for the moment at least, the video channel is typically only 20% as fast as typical core engines, while the disk channel is very often less than 5% as fast. Since video/graphics and disk access are increasingly critical to applications programs -- due to the popularity of graphics and DLL's -- very fast core engine systems can be dramatically slowed by poor performance outside of the core engine [ processor, cache, and memory ]. For example, as the memory resource demands of large applications, or multiple tasks increase, so does the probability and frequency of virtual memory swapping. Sofwin Laboratories Professional Tools, such as PCPOWER, have built in Expert Systems that can analyze the performance of every hardware system -- making it possible to project how the target computer will perform under different loads and operating system environments. THE IMPORTANCE OF SUPERSCALAR TECHNOLOGY 1994 marks the beginning of the SuperScalar revolution. Beginning with the Intel Pentium, the SuperScalar age is upon us. No longer will CPUs be limited to executing a single instruction at a time. In the future, we'll have a choice between a wide range of SuperScalar processors from Digital Equipment, IBM-Apple-Motorola, MIPS, Cyrix and others. The day of the simple, one-thing-at-a-time CPU is quickly fading. SuperScalar technology, at first dual pipelined -- but soon perhaps offering many pipes -- is radically different -- requiring new ways of looking at and evaluating computing system performance. CORSPEED is totally SuperScalar in its design and operation -- capable of instigating, managing and measuring many different operations at once. This version, for example, not only supports all existing 486 processor de- signs, but a wide range of processors from Cyrix, AMD, IBM, and other x86 compatible CPU families. Sofwin's Professional tools are also fully SuperScalar -- so we'll be able to add new SuperScalar processor families such the ALPHA, MIPS and PowerPC processors in the months to come. CORSPEED DOCUMENTATION Page 23 Sofwin Laboratories SuperScalar Measurement Tools PROCESSOR FAMILIES Until recently, Intel was the only processor family found on PC systems. But now there are several others including Cyrix, PowerPC, AMD, Digital, IBM, MIPS etc. Sofwin's SST technology recognizes both processor type and family. For some processors, such as clones of the Intel 486 made by AMD, or IBM, there is no way to accurately determine who made the CPU since they are all made from the same identical design. We call this processor family the INTEL+ family since the processor may have been made by Intel or one of its licensees. Parts which can be positively identified, including Pentium, and Cyrix CPUs are reported as belonging to the Cyrix or Intel families and their respective model designations. Where available, Sofwin tools also detect hidden model, stepping and feature information as well. Since processor model identification and family are essential to accurate performance measurement, CORSPEED will generate an error message and return to the DOS command line whenever an unknown processor model or family is detected. REGISTERED users of CORSPEED will be notified when new versions of CORSPEED are released which support additional CPU models. CORSPEED DOCUMENTATION Page 24 Sofwin Laboratories SuperScalar Measurement Tools SECTION V -- ABOUT SOFWIN LABORATORIES Sofwin Laboratories is an independent computer performance measurement and consulting laboratory with facilities in the San Francisco Bay Area and an Engineering Development facility in Columbus, Ohio. Sofwin's primary mission is helping its consulting clients and technology licensees get the most performance for their money. The laboratories also work with PC system designers and builders in planing and designing high performance computing systems and peripherals. The Sofwin measurement tools are designed and developed specifically for use in our own laboratories as well as by professionals responsible for the selection and maintenance of PC systems in business and governmental agencies world-wide. Sofwin Laboratories also sponsors a computer measurement Forum on CompuServe as well as Sofwin Reports -- the Journal of Computing Performance and Productivity. For more information on Sofwin Laboratory services, call our Columbus Engineering Center at (614) 866-9966. WANT TO KNOW MORE? SOFWIN REPORTS If you're interested in knowing more about PC system performance, you'll find Sofwin Reports, the official publication of Sofwin Laboratories, an excellent source of information on PC measurement systems, fleet operation, and professional performance reviews. Every fact-filled issue of this publication is geared to understanding the issues and the choices confronting everyone who has to make product choices. Sofwin Reports subscriptions are only $99 per year in the United States, or $129 elsewhere -- including Air Mail delivery. To subscribe, send a check of money order, payable to Sofwin Publishing Company, 613 Old Farm Road, Columbus, Ohio 43213. U. S. companies and governmental agencies may fax their purchase order to (614) 866-9960 for standard 30 day billing. FREE PROFESSIONAL MEASUREMENT TOOL CATALOG If you would like to have a copy of Sofwin Laboratories current measurement tool catalog either mailed or faxed to your office call our Columbus Engineering facility at (800) 339-2579. CORSPEED DOCUMENTATION Page 25 Sofwin Laboratories SuperScalar Measurement Tools SECTION VI -- SHAREWARE REGISTRATION While CORSPEED is being offered as shareware, it is neither free nor a toy. This is the same fully functional, professional grade measurement tool included in our full professional tool kit [ ProPakSix ]. You may acquire a fully licensed copy of this software directly from Sofwin Laboratories for only $39. Registrants receive a serial numbered copy of CORSPEED on a 3.5 inch diskette and three free sample issues of Sofwin Reports. Send your check in the amount of $39 to CORSPEED Registration, Sofwin Laboratories, 613 Old Farm Road, Columbus, Ohio 43213, or call (800) 339-2579 to use your credit card. Sofwin Laboratories 613 Old Farm Road Columbus, Ohio 43213 Telephone: (614) 866-9966 BBS/FAX (614) 866-9960 CompuServe 74431,1071 CORSPEED DOCUMENTATION Page 26 Sofwin Laboratories SuperScalar Measurement Tools