************************************************************************ * * * Qnet Version 1.0 - A Neural Modeling System for DOS and Windows * * TRIAL VERSION * * * ************************************************************************ Vesta Services, Inc. Winnetka, IL NOTICE Qnet Trial Version may be distributed under the following condition: 1) It is distributed in tact with all files. 2) No fees or other charges are made for its distribution. DO NOT USE THIS SOFTWARE UNTIL YOU HAVE READ AND ACCEPTED THE LICENSE AGREEMENT (SEE README.1ST). BY USING THE SOFTWARE (OR AUTHORIZING ANY OTHER PERSON TO DO SO), YOU ACCEPT THE LICENSE AGREEMENT. IF YOU DO NOT ACCEPT THE LICENSE AGREEMENT, DELETE THIS QNET TRIAL SOFTWARE FROM YOUR SYSTEM. Disclaimer VESTA MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. Trademarks Qnet and NetGraph are trademarks of Vesta Services, Inc. All other trademarks and trademarks used herein are owned by their respective companies. Vesta Services, Inc. 1001 Green Bay Rd., Box 196 Winnetka, IL 60093 Copyright (c) 1993, Vesta Services, Inc. All rights reserved. Table of Contents (NOTE: PAGE NUMBERS ARE FOR PRINTED MANUAL) 1.0 Introduction 1 2.0 System Requirements 3 3.0 Installation 4 3.1 DOS Version - System Configuration 4 3.2 Windows Version - Setup/Configure 5 4.1 Qnet Organization 6 4.2 Qnet Keystroke/Mouse Functions 7 5.0 Neural Networks - A Brief Description 8 6.0 Network Design and Construction 10 6.1 The Input and Output Layers 10 6.2 The Hidden Processing Structure 10 6.3 Network Connections 11 7.0 Training Data 12 7.1 Input File Format 12 7.2 Data Preparation 13 7.3 Data Normalization 14 7.4 Training Patterns - How Many? 15 8.0 Network Training with Qnet 16 8.1 Learning Modes 16 8.2 Learning Rates and Learn Rate Control 17 8.3 Starting New Networks 17 8.4 Monitoring Training Progress 18 8.5 Unattended Training 20 8.6 Training Divergence 20 8.7 Auto-Save 21 8.8 The Training Algorithm, Back-Propagation vs. FAST-Prop 22 8.9 Summary 22 9.0 Network Recall 23 10.0 Execution Speed 24 11.0 Qnet Reference 25 11.1 Main Menu 25 11.2 Training Setup Window 25 11.3 Connection Editor 30 11.4 Training Window 31 11.5 Recall Setup Window 36 11.6 Recall Window 37 12.0 Source Code for Network Recall 40 Appendix A - Example Neural Network Applications 42 Artificial Intelligence: Optical Character Recognition 42 Scientific: Digital Filter 44 Scientific: Data Analysis 45 General: Random Number Memorization 47 Investing: S&P 500 Forecaster 47 Investing: Futures - Fair Value 49 Appendix B - Hardware/Software Compatibility 51 Appendix C - Back-Propagation Technical Overview 54 Appendix D - Qnet Limits and Specifications 57 Appendix E - Tech Support 58 1.0 Introduction Welcome to the world of neural networks. There is currently enormous interest in neural systems and what can be accomplished by employing them to solve problems. Qnet has been designed to provide both the expert and the uninitiated user with a powerful tool for creating and implementing neural networks into everyday problem solving. A neural network is best defined as a set of simple, highly interconnected processing elements that are capable of learning information presented to them. The foundation of neural network theory is based on studies of the biological activities of the brain. A neural network's ability to learn and process information classifies it as a form of artificial intelligence (AI). The most exciting feature of this new technology is that it can be effectively applied to a vast array of problems, many of which have been thought to be too complex or lacking in sophisticated theoretical models. Neural networks are responsible for making significant advances in the traditional AI fields of speech and visual recognition. Investment managers are creating investment models to better manage money and improve profits. Scientists and engineers use them to model and predict complex phenomena. Marketing professionals are employing neural networks to accurately target products to potential customers. Geologists can increase their probability of finding oil. Lenders use neural networks to determine the credit risk of loan applicants. Complete neurocomputing hardware systems are being built for use in everything from automobiles to manufacturing systems. The variety of problems that can be solved effectively by neural networks is virtually endless. Qnet is a neural modeling system that has been designed to meet your most demanding needs. Qnet offers virtually unlimited flexibility in designing large, complex neural networks. Qnet includes both a DOS and Windows version. The DOS version executes in the 32-bit protected mode of 386, 486 or Pentium(TM) class CPU's. The virtual memory feature of the DOS version means network sizes are limited only by the computer system's memory and disk capacity. The DOS version also takes advantage of significant performance gains offered by running in the CPU's protected mode -- about 30% faster than standard DOS programs. Qnet neural networks learn using highly optimized training algorithms that include a standard back-propagation algorithm and a "FAST-Prop" algorithm. The Windows version of Qnet is fully compatible so that it can be used interchangeably with the DOS version (certain size limitations apply). It's your choice! Create neural models with the faster, more powerful 32-bit DOS version or use Qnet for Windows to train models in the background while you're using the computer for other tasks. Both DOS and Windows versions incorporate an advanced graphical-user-interface (GUI) that gives you the same easy to use point-and-click operation. Qnet features include: A virtual memory, 32-bit protected mode version for DOS. Create huge networks that are limited only by hardware capacities, not by built-in software limitations. A fully compatible version for MS Windows(TM). Run Qnet for Windows to train networks in the background. Blazing speed. Speeds over 300,000 connections per second on 486DX2/66 based PC's are possible with the DOS version. Pentium(TM) versions will exceed 1,000,000 connections per second. Accuracy and stability are retained by Qnet with full 32-bit floating point representation of all training information (unlike integer based neural modeling software). An advanced, easy to use graphical user interface. Simple point-and-click menu operations make learning and using Qnet effortless. On-line help. Help is available for every input field and menu item and is only an F1 keystroke away. Multiple training algorithms. Highly optimized Back-Propagation and FAST-Prop methods can be employed interchangeably during network training. Learn Rate Control to automate network training. Qnet takes the drudgery out of adjusting learn rates during training. The Learn Rate Control feature virtually guarantees hands-off, stable network training without any user interaction. Fast, easy network design. Use Qnet's connection editor to customize network configurations. Advanced network designs can be created to control logic flow by specifying network connections. Complete interactive analysis of the training process using NetGraph(TM) and its powerful AutoZOOM feature. NetGraph instantly creates graphs of all key network and training information. AutoZOOM can be used to interrogate plotted information to any level of detail required. Any graph can be saved in the PCX file format. Automated test set inclusion for overtraining and model integrity analysis. This powerful feature takes the guess work out of determining when a network is properly trained and suitable for use. Auto-Save to automatically store the network model during training. The Auto-Save feature protects you from overtraining and network divergence by allowing you to retrieve stored network information from any prescribed point of the training run. Easy data interfacing. Training data is easily imported to Qnet via the universally compatible ASCII file format with support for space, comma and tab delimited formats (these formats are supported by virtually all spreadsheets, word processors, text editors and database programs). A network recall mode. Recall mode can be used for analyzing trained networks with new sets of data. The same powerful network analysis features used for training are available in Qnet's recall mode. Source code for integrating trained networks into your own applications. Programmers will appreciate the ANSI C source code that allows you to easily interface any application with royalty-free neural models developed with Qnet. Example problems. The example problems included with Qnet will get you started learning and using Qnet immediately. All these features combine to make Qnet the most powerful and easy to use neural network model generator available. Regardless of your familiarity (or lack thereof) with neural networks, it is strongly recommended that you read through this manual before using Qnet. Familiarity with Qnet's features and options will greatly improve your productivity with this software. 2.0 System Requirements Qnet runs on compatible PC's with a minimum configuration of: 386, 486 or Pentium(TM) CPU. A math coprocessor (or a 486DX(TM) or Pentium(TM) CPU). VGA compatible graphics. A Minimum 2 Mbytes of available system memory. A hard disk with a minimum of 1.8 Mbytes of free space for installation. A mouse. MS DOS 3.x or higher. Windows 3.x (for Windows version only). Qnet requires a 386 class CPU or higher to run the 32-bit DOS version and the Windows version in 386 Enhanced mode. Math coprocessors are also required for Qnet. Training neural networks is a computationally intensive activity. Central processors alone are simply not powerful enough to handle the rigors of controlling an advanced GUI and performing the required training computations. The good news is that floating point math coprocessors are now common in the PC marketplace and have become increasingly affordable. All 486DX and Pentium(TM) class computers have built-in coprocessors. Any 386 class computer is easily upgradeable by adding an inexpensive 387 coprocessor. Intel offers OverDrive(TM) upgrades for 486SX(TM) computers that will add a math coprocessor and speed doubling technology. We can assist you with any upgrade required for your system. Qnet's advanced GUI requires VGA compatible video hardware. Qnet's DOS version will auto-sense virtually all graphics adapters and places them in the appropriate VGA compatible video mode. Refer to chapter 3 and appendix B for issues regarding video displays. The system memory required for Qnet is a direct function of the size of the neural models you expect to create. While Qnet has a virtual memory feature that allows you to run model sizes larger than the system memory capacity, training performance will begin to degrade dramatically if network sizes become significantly larger than system memory. For small to average sized problems, a minimum of 2 Mbytes of available RAM is recommended. Be aware that certain device drivers, TSR's and disk caching utilities may reduce the amount of system memory available to Qnet. A hard disk with approximately 1.8 Mbytes of free disk space is required for Qnet installation. In addition, you should have sufficient free space available during run-time to accommodate the saving of networks and output files and to support virtual memory features. A mouse is required for certain functions with Qnet's graphing utility, NetGraph, and to productively navigate through Qnet's point-and-click menu system. Refer to appendix B for a more detailed discussion of hardware and software compatibility issues. 3.0 Installation Before installing Qnet, make sure that there is a minimum of 1.8 MBytes available on the destination drive (Qnet will not install on drives with less than this amount of free space). Begin the installation process by placing the installation disk in the a: or b: drive. From the DOS prompt, type a:install or b:install. A window will be displayed where you can specify the installation location. The default location is C:\QNET. Select "OK" to proceed with the selected install path. The next window requires you to enter your name and the program's serial number. It is important that this information be entered correctly. It is required for technical support and allows the information to be retrieved from the program's ABOUT window. The serial number can be found attached to the back cover of this manual. The install program will then construct the following files and subdirectories: QNET.EXE DOS Qnet executable. WQNET.EXE Windows Qnet executable. QNETWIN.DAT File containing window and menu information for the DOS version. WQNETWIN.DAT File containing window and menu information for the Windows version. COMMDLG.DLL Common dialog Dynamic Link Library for the Windows version. QNETFGM.EXE DOS utility to determine video mode compatibility. NETWORKS Subdirectory containing sample neural models. SOURCE Subdirectory containing C source for using Qnet developed networks. (NOT IN DEMO) The install program will alter the system's AUTOEXEC.BAT file and CONFIG.SYS files if desired. It alters the AUTOEXEC.BAT file by appending the following line: PATH %PATH%;C:\QNET (NOT DONE IN DEMO) If your installation directory differs from the default, your selected location will be used. The Qnet installation directory should be in the DOS PATH so that execution can occur from any drive/directory location. The CONFIG.SYS file is altered so that the files= statement will be a minimum of 16. If an existing value of 16 or larger is found, the CONFIG.SYS is not changed. If changes are made to either of the system files by the install program, the originals will be stored in AUTOEXEC.QNT and CONFIG.QNT. Changes will not become active until the system is restarted. If you do not want the install program to alter these important files, simply select "NO" at the dialog prompts and you can make these changes manually. 3.1 DOS Version - System Configuration Qnet uses 32-bit virtual memory technology and accesses the system's extended memory (XMS) as needed. DOS expanded memory management (EMM) software is NOT required for true PC compatibles. Qnet will use EMM software products for memory allocation if they are active on your system. Products such as QEMM(TM), 386 MAX(TM) or MS DOS(TM) 5.0's EMM386 will make extended memory available when needed by Qnet. If DOS 5's EMM386 is in use, it must be explicitly configured to make extended memory available - refer to your DOS manual. The DOS mem command can be used to determine the total amount of XMS memory available with your system setup. DO NOT run Qnet's DOS version using a Windows' DOS session. Virtual memory conflicts can occur causing unpredictable results. For the vast majority of VGA compatible displays, you will not experience any video display problems. Qnet attempts to switch all VGA compatible display adapters into standard VGA 640x480 resolution mode. This is the recommended mode for Qnet. If Qnet does not automatically detect and set your particular display adapter correctly, you may attempt to force the appropriate video mode. If problems occur, use the included QNETFGM.EXE program to determine the available video modes for your system. Simply type QNETFGM at the DOS prompt (in the Qnet installation directory if necessary) to obtain a complete list of video modes supported by your adapter. The program will also suggest the best video mode to use with Qnet. Video modes may be forced by setting the environment variable QNET (i.e., set QNET=FG_VGA12). Use the AUTOEXEC.BAT file to permanently set this environment variable if required. Appendix B discusses both memory and video issues in greater detail. 3.2 Windows Version - Setup/Configure Qnet's disk installation process does not automatically install Qnet into Windows' Program Manager. To set up Qnet for Windows, start Windows and enter the Program Manager. If you wish to add Qnet to one of your current application groups, go to STEP 2 after making that application group active (click on the title bar or activate the group icon). To place Qnet in its own application group, start with STEP 1. STEP 1: Select FILE/NEW... from the menu bar. Select Program Group from the pop-up window. Type QNET in the Description field. Select OK. STEP 2: Select FILE/NEW... from the menu bar. Select Program Item from the pop-up window. Type the following (note: replace C:\QNET with your installation directory if different): DESCRIPTION: QNET COMMAND LINE: C:\QNET\WQNET.EXE WORKING DIRECTORY: C:\QNET Select OK. The Qnet icon will be placed in the selected application group. Position the Qnet icon by dragging it to the desired position. Resize and position the application group window if necessary. To save the positions and arrangements of the windows and icons, hold down the SHIFT key and select FILE/EXIT from the Program manager. This saves the current arrangement for future Windows sessions without exiting. To add the option of starting Qnet from the File Manager by selecting Qnet network definition files (*.net), open the File Manager and double click the WIN.INI file located in the Windows directory. Add the following line to the [Extensions] section of the file: net=c:\qnet\wqnet.exe ^.net Use your installation directory if different. By adding this line, Qnet may be started from the File Manager with the selected network definition file automatically loaded. If you do not use this method of starting a Windows application, you may bypass this step. 4.0 Qnet Execution Qnet's DOS version is executed from the DOS prompt by typing: [drive:\path...\]qnet [file[.net]] If the Qnet installation location is not in the DOS PATH, you will need to specify the full path to start Qnet. You may also optionally include an existing Qnet network definition file as an argument (.net extension optional). This will become the active network at startup. Qnet for Windows is started by double-clicking the Qnet icon in the Program Manager, double clicking on any valid network definition file (see section 3.2) from the File Manager or from the DOS prompt by typing: win [drive:\path...\]wqnet [file[.net]] 4.1 Qnet Organization Qnet uses a simple flow structure that makes it both intuitive and easy to use. Specific windows are used for each logical step. Qnet's organization clarifies the current task by minimizing the maze of menu and input options to only those necessary for the current activity. The organizational structure of Qnet is shown below: Qnet's main window performs the function of opening new or existing network definition files and selecting training or recall run modes. A network definition file must be opened before selecting the run mode. Selecting the run mode activates the training or recall setup window. These windows are used to set or modify any of the inputs and parameters for the current run. Select "Ok" to initiate the run after all setup activity is competed. Selecting "Cancel" will return you to the main window (with no setup activity saved). The training window is used to monitor the progress of network training. You may interactively track, plot, browse or modify key parameters during training. You may also save, exit, or abort the training process at any time. In recall mode, the recall window is activated and all inputs are processed through the network to obtain the model's output responses. Recall options include plotting, browsing and/or saving network inputs and outputs. 4.2 Qnet Keystroke/Mouse Functions The graphical-user-interface used in Qnet is highly intuitive and easy to use. The standard Windows conventions for menu bars, input fields and window sizing buttons are maintained in Qnet's DOS version. The user can easily navigate through Qnet's windows and menus with simple point-and-click operations. Qnet has a few general keystrokes that are useful throughout the program: Help on any input field or menu item. Close Help, Plot and Browse windows. Move through input fields and buttons. Move through grouped input fields and buttons. The Qnet plotting utility, NetGraph, uses these additional keys and mouse functions: Help with NetGraph. Toggle AutoZOOM on and off. Reset a zoomed plot. Create a PCX image of the plot (DOS only). Send the plot to the Clipboard (Windows only). DOUBLE CLICK LEFT MOUSE BUTTON Move the plot legend to the pointed at location. DRAG LEFT MOUSE BUTTON Define the zoom area in AutoZOOM mode. The Qnet network information browser uses these additional keystrokes: Scrolls one page up. Scrolls one page down. Scroll to top. Scroll to bottom. Scroll up one line. Scroll down one line. 5.0 Neural Networks - A Brief Description A human brain continually receives input signals from many sources and processes them to create the appropriate output response. Our brains have billions of neurons that interconnect to create elaborate "neural networks". These networks execute the millions of necessary functions needed to sustain normal life. For some years now, researchers have been developing models, both in hardware and software, that mimic a brain's cerebral activity in an effort to produce an ultimate form of artificial intelligence. Quite a few theoretical models (termed paradigms), dating as far back as the 1950's, have been developed. Most have had limited real-world application potential, and thus, neural networks have remained in relative obscurity for decades. The back-propagation paradigm, however, is largely responsible for changing this trend. It is an extremely effective learning tool that can be applied to a wide variety of problems. Back-propagation related paradigms require supervised training. This means they must be taught using a set of training data where known solutions are supplied. Back-propagation type neural networks process information in interconnecting processing elements (often termed neurons, units or nodes -- we will use nodes in this manual). These nodes are organized into groups termed layers. There are three distinct types of layers in a back-propagation neural network: the input layer, the hidden layer(s) and the output layer. Connections exist between the nodes of adjacent layers to relay the output signals from one layer to the next. All information enters a network through the nodes in the input layer. The input layer nodes are unique in that their sole purpose is to distribute the input information to the next processing layer (i.e., the first hidden layer). The hidden and output layer nodes process all incoming signals by applying factors to them (termed weights). Each layer also has an additional element called a bias node. Bias nodes simply output a bias signal to the nodes of the current layer. Qnet handles these bias nodes automatically. They do not need to be included or specified by the user. All inputs to a node are weighted, combined and then processed through a transfer function that controls the strength of the signal relayed through the node's output connections. The transfer function serves to normalize a node's output signal strength between 0 and 1. Processing continues down through each layer to obtain the network's response at the output layer. When a network is used in recall mode, processing ends at the output layer. During training, the network's response at the output layer is compared to a supplied set of known answers (training targets). The errors are back-propagated though the network in an attempt to improve the network's response. The nodal weight factors are adjusted by amounts determined by the training algorithm. The iterative process of adjusting the weights and reprocessing inputs through the neural network constitutes the learning process. One training iteration is complete when all supplied training cases have been processed through the network. The training algorithms adjust the weights in an attempt to drive the network's response error to a minimum. Two factors are used to control the training algorithm's adjustment of the weights. They are the "learning rate coefficient", eta (or ), and the "momentum factor", alpha (or ). If the learning rate is too fast (i.e., eta is too large), network training can become unstable. If eta is too small, the network will learn at a very slow pace. The momentum factor has a smaller influence on learning speeds, but it can influence training stability. Qnet uses a sophisticated control scheme that adjusts the learning rate coefficient to keep network training proceeding at a near optimal pace. Chapter 8 discusses the training process in greater detail. Network designs can be created with multiple hidden layers (up to 8), and they can be fully connected or connections can be customized. The term "fully connected" means that each node in the hidden and output layers has one input connection from each node of the preceding layer. All connections in a back-propagation neural networks are feedforward. That is, they must connect to a node in the next layer. Figure 5.1 depicts a standard fully connected neural network configuration. Appendix C contains a detailed overview of Qnet's implementation of back-propagation neural network theory. 6.0 Network Design and Construction When designing a network, the modeler must specify the following information: The number of input nodes. The number of hidden layers (1 to 8). The number of nodes in each of the hidden layers. The number of output nodes. The connection design of the network. 6.1 The Input and Output Layers The input layer of a neural network has the sole purpose of distributing input data values to the first hidden layer. The number of nodes in the input layer will be equal to the number of input data values in the model. For example, assume a lender wishes to create a neural network that will accept or reject automobile loan applications. Inputs could include things such as the loan applicant's age, marital status, the number of dependents, education status, total family income, the total monthly debt payments (house, cars, credit cards, etc.) and the monthly payment required for the new loan. If this is the extent of input information for the model, then this network would be designed with 7 input nodes. The output of this model is simply whether the person is qualified or not (1=yes, 0=no). Therefore, the output layer would consist of one node. Each case of 7 inputs and 1 output is referred to as one training pattern. If there is previous loan information for 5000 people, the model could be trained using 5000 patterns. 6.2 The Hidden Processing Structure From the above example, it is clear that determining the number of input and output nodes is trivial once the data model has been formulated. Choosing the number of hidden layers and the number of hidden nodes in each layer is not so trivial. The construction of the hidden processing structure of the network is arbitrary. However, the importance of selecting an adequate hidden structure should not be underestimated. Many factors play a part in determining what the optimal configuration should be. These factors include the quantity of training patterns, the number of input and output nodes and the relationships between the input and output data. Normally, there will be many network designs that are similar in size and structure that will yield excellent results. It may often be tempting to construct a network with many hidden layers and processing units -- falling into "the bigger the brain the better the model" trap. This philosophy can easily result in a poorly performing model. When a network's hidden processing structure is too large and complex for the model being developed, the network may tend to memorize input and output sets rather than learn relationships between them. This is especially true for networks being trained with a small number of training patterns. Such a network will train well, but test poorly when presented with inputs outside the training set. The concept of memorization learning versus cognizant learning will be explained in detail in chapter 8. Generally, it is best to start with simple network designs that use relatively few hidden layers and processing nodes. If the degree of learning is not sufficient, or certain trends and relationships cannot be grasped, the network complexity can be increased in an attempt to improve learning. A plausible starting point for the loan application model would be to use 2 hidden layers with 3 to 4 nodes per layer. If this design does not train sufficiently, the size and complexity of the hidden structure can be increased. For this problem, memorization would not be likely due to the relatively large number of training patterns (5000). It has been demonstrated theoretically that for a given network design with multiple hidden layers, there will always exist a design with a single hidden layer that will learn at an equivalent level. However, in practice it is usually better to employ multiple hidden layers in solving complex problems. To adequately model a complex problem, a single hidden layer design may require a substantial increase in the number of hidden nodes compared to a 3, 4 or 5 hidden layer construction. In simple terms, a single hidden layer design with 10 nodes may not be able to learn as much as a network with two hidden layers containing 5 nodes each. Multi-hidden layer networks tend to grasp complex concepts more easily than networks with one layer. One reason for this is that the multi-hidden layer construction creates an increased cross-factoring of information and relationships. Thus, a network's learning ability is controlled by both the total number of hidden layers and the total number of hidden nodes. Qnet allows up to 8 hidden layers (experience has shown that the vast majority of problems will work fine with 4 or less hidden layers). The number of nodes per layer in the DOS version is limited only by the memory available and practical limits in processing speed. The Windows version is theoretically limited to 32000 nodes per layer, but expect the practical limits to be much lower due to speed and memory considerations. 6.3 Network Connections The final network design consideration concerns how to control the network's connections. Qnet implements a connection editor that allows connections to be removed from the fully connected default configuration. This allows logic flow to be introduced to the network. Input information can be channeled and processed in a localized area of the network. "Pass-thru" nodes can be constructed that receive only one input connection from the preceding layer and pass that information down to the next layer. This has the effect of creating connections that skip a layer. While the connection editor gives the modeler almost unlimited flexibility in designing a network, the fact is that the vast majority of designs work best fully connected. Qnet's connection editor is best suited for highly advanced models that require groups of input data to be processed through separate network pathways. 7.0 Training Data Before a network can be created and trained by Qnet, data for the model must be organized and formatted for compatibility with Qnet. The steps required to create training data for Qnet involve: Gathering the training cases. Determining what, if any, data preprocessing should take place. Formatting the final training set for Qnet. With back-propagation neural networks, the more training data that is gathered for the training process, the better the model will likely be. With more training cases available, the modeler is able to consider increasingly complex network designs. Gathering a large number of training cases will also make it easier to employ rigorous test sets for overtraining analysis and model integrity checks. The use of test sets in model training is discussed in chapter 8. Once the model information is gathered or generated, data preparation and formatting are required. These tasks can be easily accomplished using any of today's popular spreadsheet or database programs. The following sections discuss the data formatting and preparation steps in detail. 7.1 Input File Format Preparing data for use with Qnet is an easy process. Training data files use the universally compatible ASCII (text) row/column (columnar) input format. Each data column represents data for one input node or a target for one output node. Each row in the file represents one input training case or pattern. Using the loan application example from the previous chapter, the data columns would include the age, marital status, number of dependents, years of education, total family income, total monthly debt payments, and the monthly payment required for the loan. There is also one output node, the qualification status. The target data for the output node(s) can be located in the same file as the inputs or in a separate file. If the same file were used for both, then the loan application training file would consist of 8 data columns. If the historical database contained 5000 loans, the training file would have 5000 rows (i.e., lines or records). Data column delimiters can be any combination of spaces, commas or tabs. The only requirement is that both the input node data and output node targets be in contiguous data columns. For the above example, the data columns containing the input node information can be 1 through 7, 2-8, 10-16, etc. You simply tell Qnet which data column to start reading from. The same rules apply for the target data used by the output node(s). Blank and commented lines are ignored. Comments can be inserted in the files by starting the line with a "#" character. The following is a template of what an input file may look like: # THIS IS A COMMENT ...... ...... <=== PATTERN 1 ...... ...... <=== PATTERN 2 ...... ...... <=== PATTERN 3 ...... ...... <=== PATTERN 4 . . . . It is not required that the input node data columns precede the output node data columns if the two sets of information are contained in the same file. The use of a spreadsheet as a training data preprocessor to Qnet is highly recommended. A spreadsheet will allow you to group columns, move, add or eliminate rows and perform virtually any type of data preparation required. If the training data can be imported into or has been generated with a spreadsheet application, it is very easy to format and save the data in an ASCII (text) format. For example, Microsoft Excel(TM) allows any spreadsheet to be saved in a DOS TEXT/ASCII format with comma (CSV format) or tab data column delimiters -- either format is compatible with Qnet. Lotus 123(TM) and Quatro Pro(TM) allow data to be "printed" to ".PRN" text files that are space delimited. (Quatro Pro for Windows Version 1.0 requires that the slash key macro be active. Select PROPERTY/ APPLICATION/macro/slash key/Quatro Pro-DOS. The slash key will then invoke a 123 like menu system that allows printing to space delimited text files.) Likewise, all popular database applications can create data files in an ASCII text format. If the training data is coming from your own private application, simply follow the above rules when writing to formatted text files. For very large models with hundreds or thousands of input and output nodes, each line in the file can become quite long. Qnet's DOS version has no limit on the length of each line or record that will be scanned for data. Qnet for Windows permits 32000 characters (or bytes) per line. Only extremely large models would exceed this limit. Most spreadsheet programs limit output to 256 data columns. When generating a Qnet file that would contain data for 1000 input nodes, several files would have to be combined to create the entire input set. If you require utilities for working with large data sets, contact Vesta for assistance. Qnet's training data file selections are made in the training setup window, or in the recall setup window when using recall mode. Qnet allows you to view the training (or recall) data by pressing the F2 key after making the file selection. Viewing the file will verify that the file format is compatible with Qnet and that all data will be correctly imported for training. Section 11.2 discusses all training setup options relating to the training data files. 7.2 Data Preparation Proper data preparation can make the difference between successful and unsuccessful neural models. Some models will benefit greatly from simple transformations of the input and target data. For this reason, it is important to understand how different training data representations will influence the neural model being created. Neural network training data falls into two classes: continuous valued and binary (0/1). For many inputs the data can be processed and represented in either of these formats. Let's assume we wish to create a model that will project the monthly sales of widgets and is going to include the month of the year as one of the inputs. We can either represent months as a continuous value from 1 to 12 through a single input node or as 12 separate nodes using binary inputs. For the binary case, all nodes would be set to 0 except for the month we wish to project sales for. As a second example we wish to predict the direction of the stock market. Should we predict the following week's market direction as the value of the S&P 500 index, a percentage change from the current week's level or as a simple binary value indicating up or down? Clearly, decisions must be made. Making the right choice can make or break the model being designed. When deciding between continuous values or a binary representation, one must consider the impact upon what is being modeled. For the widget example, assigning continuous values of 1 through 12 to represent the month implies a predetermined ranking for each month. For many models, we would have no reason to believe that August should be better than April or that January should be less than November. The sale of widgets may have distinct monthly patterns that have nothing to do with the month's chronological order. Creating 12 binary input nodes avoids the implied ranking problem. The neural network that uses one input node may produce acceptable results, but a considerable amount of extra training will be required to decode the implied ranking. The optical character recognition problem included with Qnet provides another example of binary data representation. The model's output is a determination of what number (0 through 9) has been presented to the network through a bitmap picture. We could construct a model with one output node corresponding to the actual number recognized, or we could set up 10 output nodes with each node representing one digit. Using 10 output nodes is the proper way to formulate this model. This is because the process of recognizing a character is independent of the character's numerical value. Forcing the network to assign a numerical ranking to the output will unnecessarily complicate the main task of recognizing the character. The ten output node design will simply output a 1 to the appropriate node when a number has been recognized. In practice, when new and somewhat different images (or fonts) are presented to the network for recognition, we may not get an exact 0 or 1 reading from the output node. The optical character recognition program utilizing the neural network may require that the output of a node be greater than some threshold (say .5) before it will consider a character recognized. If multiple nodes indicate some degree of recognition, the one with the greatest output strength would likely be selected. For the stock market forecasting example, continuous values would provide more information than a binary representation indicating simply whether the market is up or down. Knowing the magnitude of the up or down will improve model learning by providing additional information. Also, the size of the up or down prediction would likely correlate with the probability of the model predicting the right market direction. It is also important to consider how a continuous value should be represented. A major pitfall the neural modeler must avoid is the use of unbounded inputs or targets. If one were to choose the S&P 500 index as the target value, substantial problems would result. The future value of the index has no upper bound. Once the index moves outside the training range, the model will perform poorly. Also, if the model is to predict weekly changes, many small movements of around 2 or 3 index points will be present in the data. If the training range of the index is between 250 and 450, how well will the network be able to resolve a weekly change from 401 to 403? These problems can be eliminated by using a percentage change format. Excluding highly volatile swings, one can be confident that the index will fluctuate within a range of around q5% during most weeks. This format provides a reasonable upper and lower bound for the target values and it provides a scale that the network can easily resolve. If isolated, volatile swings produce a few weekly changes significantly outside the typical range, consider limiting those changes to some threshold value. This will prevent isolated cases from unduly impacting network predictions. Another data preparation problem can occur when there is an extremely large amount of input node data to model. This problem is common with back-propagation neural networks used for visual recognition. Take a case where a neural model is to be created to monitor the quality of a weld on a production part going down an assembly line. A camera will provide the neural model with a picture of the weld, and the part will be accepted or rejected based on this picture. The input to the neural model will be a digitized picture from the camera. The output of the network will be to simply accept or reject the part (binary). If the digitized picture has a resolution of 1000x1000, the total number of input nodes is 1 million (i.e., one node per pixel). While Qnet can theoretically handle a problem this large, computer speed and memory limitations will likely prevent a network of this size from being trained and put into practical use. The solution is to compress the video information in some manner to reduce the total amount of information that must be modeled. A simple way to reduce the image size is to tile the image. This involves averaging neighboring pixels to reduce the overall quantity of inputs. A second method is to use Fast Fourier Transforms to compress the image into a series of waveforms. The waveform coefficients are used as network inputs instead of the actual pixel data. This technique has been used quite successfully with visual recognition problems. 7.3 Data Normalization Back-propagation neural networks require that all input node data and output node targets be normalized between 0 and 1 for training. Qnet can automatically perform the data normalization for you. If this option is selected during training setup, all data for the nodes in the input layer and/or training targets for the output layer will be normalized between the limits of .15 and .85. If new training patterns are added to the training set in subsequent sessions, the data will be re-normalized as necessary. (Note: This may make the network weights slightly out of sync with the training data. A small amount of training will be required to recover.) Even if the training data is already between the limits 0 and 1, normalization may still be desirable. For example, if all target data is between .01 and .02, it would be better to normalize the data over a wider range so that the network can better resolve and predict the targets. When Qnet's automatic normalization is used, the entire normalization process becomes invisible to the user. All network inputs and outputs will be returned to their original scale when plotted, printed or saved to a file. For recall mode, input node data will be normalized to the same limits used for network training. Network outputs will be automatically scaled to the proper range if automatic normalization was selected for the training targets. If new input data is being presented to the network, there is always a chance that the new normalized data will fall outside the 0 to 1 limits. This may not be a problem, however, it should be noted that when inputs to a network are significantly different from the data ranges that were used during training, the model's accuracy must be questioned. Again using the loan application example, let's assume that in our 5000 cases the number of dependents ranged from 1 to 10. If a loan applicant comes in with 18 dependents (this is only an example), could the network accurately predict whether the person should qualify for the loan? The model will produce an answer, but the result may be of questionable accuracy. 7.4 Training Patterns - How Many? The number of cases that are used for training is extremely important. For back-propagation neural networks, the more training patterns that are used, the better the resulting model will be. The only limit with Qnet's DOS version is the speed and memory of the computer system and the ability of the modeler to collect training cases. Qnet for Windows is limited to 32000 training cases. A large number of training cases allows the network to generalize and learn relationships easier. As stated previously, the size and complexity of the network structure can be increased, since training set memorization becomes less likely. Training sets with a small number of patterns run the risk of being memorized, even by small network configurations. The memorization problem is discussed further in the following chapter. Qnet also allows the use of test sets during training. Test sets are patterns set aside from the training set to test for network overtraining (see chapter 8) and to check the integrity of the model. If a model cannot respond intelligently to patterns outside the training set, then the model will be of little value (unless training set memorization is the goal). Qnet allows the user to indicate the number of patterns to allocate for the test set and then to monitor the network's response error to these test cases interactively during the training process. These patterns must be in the same file(s) as the training patterns and they must be located at the beginning or end of the file(s). If the patterns in the training data file(s) are in some type of systematic order, it may be beneficial to change the pattern sequence so that diverse case types are contained in both the training and test sets. Normally, the training set will contain many more patterns than the test set. Ratios of training patterns to test patterns are commonly in the range of 5 or 10 to 1 (and higher). 8.0 Network Training with Qnet Once the training data has been organized for the model and certain network design features have been chosen, network training can begin. At this point Qnet should be started, a new network definition file selected and a training run initiated. The training setup window is used to specify the network configuration, the training data file information and the initial training parameters. Once this information is specified, training can begin. Depending on model size and complexity, the training process may take minutes, hours or even days for very large problems. Qnet's fast execution speeds and assorted analysis tools are designed to simplify the training process. The graphing and analysis tools make it easy to determine how well a network has learned and when a network has been trained to its optimal level of performance. The training parameters may be interactively changed during the training process. This chapter covers some of the important issues and concepts involved in the setting of the training parameters and the analysis of a neural model during the training process. 8.1 Learning Modes An important concept to understand about the training process is that there are two distinct types of learning that can take place. One type is "cognizant learning" where the network develops an understanding of how the inputs can best be generalized to formulate an output prediction. The other type is "memorization" where the network can, in effect, recall a set of outputs after "seeing" the inputs. An example of cognizant learning is where a person studies how to add together a small set of numbers (i.e., 2+2, 3+5, etc.) and through understanding some basic concepts, that person can determine the results of any two numbers presented to him (even if they were not previously studied). An example of memorization learning is where a person learns the US state capitals. Learning the capitals of 45 states does little to help a person predict what the other 5 might be. Memorization learning is only useful for the learned set. It offers no help in determining solutions outside the learned set. The sample problem "RANDOM" included with Qnet shows an example of memorization learning. While memorization does have some benefits for certain models, cognizant learning is desired for the vast majority of real-world neural models. Differentiating between the learning modes will allow you to optimize model training and better determine the effectiveness of the a model prior to practical use. Qnet's training algorithms attempt to drive the network's response error (for the training set) to a minimum value. The error value monitored during Qnet training is the root-mean-square (RMS) error between the network's output response and the training targets (equivalent to the standard deviation). When the training set's error is descending during the training process, one or both types of learning discussed above is taking place. Unfortunately, there is no way to determine which type of learning is taking place by monitoring the training set error by itself. To determine the type of learning, a test set (or overtraining set) must be employed. Qnet allows the test set to be monitored interactively during training. This set of data is not used to train the network, however, the error in the network response is monitored to determine how the network responds to patterns outside the training set. If both the training and test set errors are declining, cognizant learning predominates since the network is learning to generalize the relationships between the inputs and outputs. If the test set error increases while the training set error declines, then memorization is the predominant learning mode. When a test set's error has reached a minimum level and begins to increase indefinitely thereafter, overtraining is occurring. Overtraining a network after this minimum has been reached can actually hurt the predictive capabilities of the model being developed. Section 8.4 discusses these concepts further. It should be noted that the method of determining the learning modes and overtraining status by monitoring the training and test set errors assumes that the test set is a perfect random subset of the training set. This may not always be true. For problems where the test set is some organized subset of the training set cases, the minimum test set error may simply indicate the point that the network has best modeled that subset of test set cases. To function as a true overtraining indicator, it is important that the test set cases be a truly random sample of the training cases. 8.2 Learning Rates and Learn Rate Control The back-propagation training paradigm uses two controllable factors that affect the algorithm's rate of learning. These factors must be adjusted properly during the training process. The two factors are the learning rate coefficient, eta, and the momentum factor, alpha. The valid range for both eta and alpha is between 0 and 1. Higher values adjust node weights in greater increments, increasing the rate at which the network attempts to converge, while lower values decrease the rate of learning. Just as there are limits to how fast a brain can learn ideas and concepts, so too there are limits to the rate at which a network can learn. If a network is forced to learn at a rate that is too fast, instabilities develop that can lead to a complete divergence of the training process. The training coefficient, eta, can be controlled manually during training or Qnet can control it automatically using its Learn Rate Control (LRC) feature. LRC will drive eta higher or lower in a systematic fashion depending on the current learning activity. If the network appears to be learning at a relatively slow rate, eta is driven up quickly. Conversely, if the network is learning at a fast pace, Qnet will raise eta only slightly, hold it constant, or even lower it to avoid instabilities. If at any time the network shows signs of instability (seen as oscillations in the training error), eta is lowered quickly to damp the instabilities. Damping instabilities is critical to preventing complete training divergence. The LRC feature can be turned on and off interactively during the training process, and it can be activated at setup time by specifying the iteration number that LRC will start (see chapter 11.2). Occasional interaction with the LRC system can help to improve the learning process. For example, let's assume that a network is training with LRC active. After several hundred iterations NetGraph is used to view the eta history. The graph shows that whenever eta exceeds a value of 0.15, instabilities occur (seen as oscillations in the RMS error) and eta is dropped substantially to avoid divergence. During each recovery process, learning slows due to lower learning rates. By setting appropriate minimum and maximum values for eta, LRC will keep eta below the established upper limit and above the specified minimum. For the above example, eta could be limited to a maximum of 0.12 to prevent the instabilities from occurring. This will keep the network learning at an optimal pace by preventing the slow downs required to recover from instabilities. It is common for these limits to change gradually over many iterations (usually the upper limit decreases during the training process, but not always). Repeat this procedure when instabilities develop on a regular basis. LRC concerns itself only with control of eta. Usually, little or no interaction is required with the momentum factor, alpha. The momentum factor damps high frequency weight changes and helps with overall algorithm stability, while still promoting fast learning. For the majority of networks, alpha can be set in the 0.8 to 0.9 range and left there. However, there is no definitive rule regarding alpha. Some networks may train better with alpha values set at a lower level. Some networks train perfectly with no alpha term used at all (set to 0). Most neural modelers prefer to use higher momentum values, since this usually has a positive effect on training. If training problems occur with a given alpha value, it may be helpful to experiment with different values. Alpha can be changed interactively at any time during the training process with Qnet. 8.3 Starting New Networks A new network is initialized by setting all processing node weights to random values. Qnet does this automatically for new networks. This option can also be selected for any network during setup or training to reset the network to an untrained state. When initiating the training process for a new network, several decisions must be made regarding certain training parameters. First, a seed value for the learning rate must be chosen during training setup. An eta value in the 0.01 to 0.1 range usually works well for new networks. If the initial guess turns out to be too high and the network diverges (see section 8.6), simply reset the network by selecting "Options/Randomize Weights" from the training window's menu bar. Select "Options/Set ETA" to try a lower learning rate value. A second consideration for new networks concerns the iteration number at which LRC should be activated. When training begins with a new network, the training error can oscillate wildly. This is normal behavior for new networks. If Qnet's LRC option is active during these initial oscillations, eta will be lowered in an attempt to eliminate them. This can slow training by driving eta to an artificially low value. To prevent this from occurring, set the "Begin Learn Rate Control" item in the training setup window to at least 50 or 100 iterations. The LRC option can also be turned on and off interactively during the training process. If a long training session is planned, the number of iterations should be set to a very large number. The number of iterations can be changed during the training session or training can be interactively terminated at any time. Other Qnet training parameters can normally be kept at their default values. The parameters include: the FAST-Prop Coefficient, the minimum eta and maximum eta settings for LRC, alpha, the screen update rate, and the Auto-Save rate. These parameters are easily adjusted during the training process. 8.4 Monitoring Training Progress The training process is monitored through the functions available in the training window. Visual readouts exist for the network configuration, the current training parameters, the training and test set RMS errors, the number of training iterations performed and the time remaining for the run. Monitoring the error histories is the quickest way to determine the training progress with Qnet. Along with the visual readout of these errors, complete error histories for the run can be obtained with NetGraph. This provides the modeler with the most information about the progress and the relative state of a network's convergence. NetGraph's AutoZOOM feature can be particularly helpful when viewing error histories. AutoZoom allows the modeler to zoom in on features of interest that might be obscured by the scaling used for the initial plot. By monitoring the training set's RMS error, the modeler can determine the pace of network learning, the frequency of instabilities and the general state of convergence (or divergence). Qnet's training algorithms attempt to drive the training set error to a minimum value. As a network reaches this converged or steady state, the error value will approach some minimum value. By monitoring the test set's RMS error, the modeler can determine the overtraining status of the network and how well the network responds to cases not contained in the training set. For some networks, the test set's error will simply decline to some minimum value. Another possibility is that it will decline, reach a minimum, and then increase indefinitely thereafter, even though the training set's error continues to decrease. For this case, the model should only be trained to the point of the test set's minimum error. When the test set error begins to increase it can be assumed that memorization is predominating and overtraining has begun. Unfortunately, determining this point is not that simple. False or local minimums may occur in the test set error. These local minimums indicate that some mix of the learning methods is taking place. Some networks may exhibit long periods of training where the test set error increases before declining again. Figure 8.1 depicts these possible scenarios. (Note: To minimize the effect of test set error computations on training speed, Qnet computes the error value during screen update iterations only. Setting large screen update intervals will limit your ability to monitor the test set error.) If a test set's error begins to increase, training should be continued to determine whether the minimum is local or global in nature. The Auto-Save feature of Qnet allows you to return to a point at or near the minimum if it is global in nature. Auto-Save will store the network at selected intervals during training. This interval can be specified during setup or training. To retrieve an Auto-Saved network, select "File/Save Auto-Save File..." from the training menu. You will be asked to specify a network file name and an iteration value (from a list of iterations at which the network was saved). Select the iteration that is nearest, but still prior to the start of overtraining. If additional training is required from this iteration, exit the current training session and open the newly saved network file for training. Section 8.7 discusses Qnet's Auto-Save feature in greater detail. When either the training set error or test set error begins to increase while using the FAST-Prop training algorithm, return to standard back-propagation by setting the FAST-Prop coefficient to zero. While the FAST-Prop method of training can accelerate the learning process, this training method can at times get "stuck". The training and/or test set errors may start to increase or fluctuate. Standard back-propagation does not experience this problem. See section 8.8 for further information. If no test set is employed for the training process, no overtraining analysis can be performed. Training for such models should be terminated when no significant decrease in the training error occurs with additional training iterations. Any model developed in such a manner should undergo extensive testing to determine the quality of the network's output responses to inputs outside the training set. Other interesting information can be derived from the training and test set error history plots. It is common to find long "plateaus" in the error level where no significant learning takes place. This indicates that the network is trying to "figure out" certain input/output relationships. Plateaus are often followed by steep descents in the training error, yielding accelerated periods of learning. It is important that "plateau" conditions are not mistaken for a converged network. Another common feature in error history plots are minor oscillations representing training instabilities. Training instabilities are quickly damped by the LRC feature if active. If these oscillations start to occur frequently, consider augmenting LRC by setting a maximum eta that LRC should not exceed. In addition to training and test set error history plots, Qnet's NetGraph tool can produce other plot formats to provide even greater detail about training progress and the network model. Graph formats include: 1) The learning rate, eta, versus the training iteration. This plot can provide the detailed history of the learning rate when LRC is in use. It is best used to determine what eta limits should be applied to Qnet's LRC option. 2) Training and test set targets compared to the network responses for each set. The information is plotted versus the pattern sequence number (a separate plot is produced for each output node). For this plot, up to four separate curves will be shown: the training set targets, the training set network responses, the test set targets and the test set network responses. The test and training sets can be distinguished by different colored curves. Use this plot format to quickly compare the network's output predictions with the training targets. 3) The normalized network responses versus the normalized target values in a scatter format (all output nodes on the same plot). This plot can be used to obtain a quick overview of how well all network predictions agree with the training targets. For perfect agreement, all plotted points will fall on a line Y=X (meaning the training targets = the network predictions). The training and test set points are plotted with different symbols so that the results of the two sets are distinguishable. 4) The network response errors versus the training pattern sequence number (separate plot for each output node). This plot is similar that described in item 2, however, it shows the difference (or error) between the network predictions and training targets. Separate curves are plotted for the training and test sets. 5) The input node data versus the training pattern sequence number (separate plot for each input node). This plot format can be used to scan the input node sets for possible data anomalies. It is recommended that these plots be reviewed at some point during training to scan the inputs for bad data. The graph formats available in Qnet are powerful tools for analyzing the training process and determining the quality of the model. NetGraph's AutoZoom feature can be used with any of these plots to better analyze the plotted data. Section 11.4 provides a discussion of NetGraph's functionality. Qnet's browser can be used to check the average and maximum network response errors for each output node, view the node weights and browse through the network responses and their targets. Information in the browser can be sent to the system's printer or saved to a file. 8.5 Unattended Training Large, complex models often require extended periods of unattended training. Overnight training must often be considered so that the PC will not be tied up during prime use hours. Several steps should be taken when planning unattended training. These include: 1) Make sure the "Time Remaining" field is at least as long as the period you wish to train for. If not, increase the number of iterations (Options/Iterations...). 2) Keep LRC on to guard against network divergence. 3) Set the FAST-Prop coefficient to 0 (use standard back-propagation). 4) Set the Auto-Save rate to a reasonable level. The Auto-Save rate should be set so that if the run results in overtraining, you can return the network to its optimal training point in a reasonable length of time. For example, if it takes 10 minutes to perform 100 training iterations, setting the Auto-Save Rate to 100 will guarantee that you can return to any training point within 10 minutes. Another important consideration when using Auto-Save during long unattended sessions is disk space. Adequate disk space must be available to store the network model at the requested rate. Section 8.7 provides a detailed discussion on this topic. To safeguard your system's video display against screen burn-in during long periods of unattended training, we recommend that you use a screen saver utility. Windows has this feature built-in and many DOS programs are available for this purpose. When using a screen saver utility under Windows, avoid the animated types. Animated screen savers will drastically reduce the CPU time available to Qnet. In a pinch, your monitor's brightness and contrast controls or the on/off switch will do the job just as well as any screen saver utility. 8.6 Training Divergence During the training process, the network's learning pace may at times become too fast. When this happens, learning instabilities develop. These instabilities show up as small oscillations in the training error. If the learning rate, eta, is not lowered in response to the instabilities, network divergence can result. Qnet's Learn Rate Control helps prevent divergence by automatically lowering eta. If a network does diverge, the training RMS error will normally reach a large constant value (usually around 0.5). It is also possible to have a divergence at one output node and have continued convergence for others. Such cases will exhibit a training error that continues to decrease. If one suspects that a divergence has occurred, use NetGraph to plot the network responses for each output node. If an output node's response is constant and completely outside the range of the target data then that node has diverged. This indicates that the normalized response of the network is all 1's or 0's. Figure 8.2 shows a network output response that has diverged. A diverged network must be either reset to a randomized state (select "Options/Randomize Weights" from the training menu) or returned to an iteration prior to divergence (if Auto-Save was active). Use the Auto-Save recovery procedure for cases where a great deal of training would be lost by simply resetting the network. Normally, setting eta to a lower value or limiting LRC with a lower maximum eta will prevent the divergence from reoccurring. If divergence problems continue, try lower values for alpha. Using Qnet's LRC feature is the best way to minimize the risk of divergence. 8.7 Auto-Save The Auto-Save feature of Qnet allows you to easily recover from overtraining situations and training divergence. The user specifies a rate (or interval) at which the network should be stored. Qnet will store the network state in a temporary file at the selected interval during the training process. For example, a rate of 100 will cause the network state to be stored every 100th training iteration. The network state is not saved to your current network definition file. This must be done manually by selecting the "File/Save Network State" option from the training menu. A network's state can be retrieved from the temporary Auto-Save file and stored to a permanent network definition file by selecting "File/Save Auto-Save file..". When selected, you will be prompted for a file name and the iteration number to use for storing the network information. The current run is unaffected by this process. If you wish to start training the network retrieved from Auto-Save, exit the current training session and start Qnet using this new network definition file. The temporary Auto-Save file is destroyed when the current training run is terminated. You must retrieve any desired network states before exiting. Also, if you save a network state to the same file name as the one being used for the current run, do not save the current run upon exiting or the Auto-Saved network will be lost. When selecting the Auto-Save rate, several considerations must be made. Storing the network too often can slow execution speeds and increase disk space demands. Limiting Auto-Save to 10 or 15 minutes between stores will have the following benefits: 1) You can recover from a training problem in a reasonable period of time. For example, if overtraining started to occur at iteration 4750 and you had an Auto-Save rate of 500 (assume 500 iterations per 10 minutes), you could reach the optimal training point in about 5 minutes by retrieving the network state at iteration 4500 and performing an additional 250 training iterations on that network file. 2) Longer time intervals between Auto-Saves will reduce the amount of disk space required. Small and medium size networks will require less than 100 KBytes of disk space for an overnight run if the saves are at least 10 minutes apart. If very long unattended sessions are planned (i.e., weekends or longer), caution must be used in selecting the Auto-Save rate. A simple calculation determines the space required: (bytes of disk space required) = (byte size of network's .NET file) * (# training iterations per minute) * (minutes of unattended training planned) / (Auto-Save rate). The "iterations per minute" number can be computed by dividing the "Max Iterations" by the "Time Remaining" field (converted to minutes) displayed at the start of any training run. 3) Using longer intervals between stores will reduce the impact of Auto-Save on training speed. Disk writes are slow and can significantly retard execution times if they are performed too often. 8.8 The Training Algorithm, Back-Propagation vs. FAST-Prop The FAST-Prop coefficient controls the algorithm used by Qnet for training. If this coefficient is set to 0 (the default), Qnet will employ its back-propagation algorithm to train the network. If the coefficient is set to a value above 0.0 (to a maximum of 3.0), the FAST-Prop algorithm is used. The closer the coefficient is set to 0.0, the closer FAST-Prop approximates standard back-propagation. While the FAST-Prop training method can often accelerate the learning process, a drawback with this method is that there is a risk that this algorithm will not converge to a minimum error, especially when higher coefficient values are used. For this reason, it is recommended that the training algorithm be switched to the standard back-propagation method at some point during the training process. Likewise, the FAST-Prop algorithm is NOT recommended for long periods of unattended training. Whenever FAST-Prop is being used, the training and test set RMS errors should be monitored closely for signs that the network is no longer converging. If either of these error values begin to increase, it is recommended that the FAST-Prop coefficient be set to 0. 8.9 Summary If you are new to neural networks, training a network for the first time may seem a bit intimidating. After a little practice you'll find that Qnet gives you all the features needed to make the training process fast and easy. The above discussions provide you with the information you need to get started. Qnet's example problems contain both trained and untrained networks. It is strongly recommended that you attempt to train some or all of the untrained sample networks to familiarize yourself with Qnet's tools and the training process. 9.0 Network Recall Qnet's recall mode is used to process new inputs through a trained network. Simply supply the data file information in the recall setup window to process the new information. If output target's are provided in recall mode, the predictive qualities of a model can be checked. After all input patterns are processed, the network's output may be analyzed using NetGraph and Qnet's browser. Network outputs can also be saved to an ASCII file with tab delimited data columns. Qnet also includes ANSI C source code for incorporating trained networks into your applications. Qnet networks can be used in your own private applications or they can be used royalty-free in applications for sale. Qnet's source code allows programmers to easily enhance their software neural network models. Chapter 12 discusses this process in greater detail. 10.0 Execution Speed Execution speed during training is one of the most important aspects of neural network software. Qnet has been written with fast execution speed as one of its primary goals. Execution speed during the training process is largely determined by your problem size and processor speed. There are several steps that can be taken to significantly improve your execution speed and limit convergence times. These include: 1) When the PC can be dedicated to Qnet training, use the DOS version. The DOS version normally operates at 2 to 3 times the speed of the Windows version. This is because the DOS version runs in the processor's protected mode and because the overhead of the Windows operating environment takes a substantial amount of processing time away from Qnet. 2) Pay attention to the rate of screen updating during training runs. During training, key network parameters are updated to the screen. How often this happens is controlled by the screen update rate. A value of 1 updates the screen each training iteration, a value of 2 updates the screen every second iteration and so on. Each screen update takes CPU cycles away from the solver. While this may seem insignificant, it can slow execution by 50% or more in extreme cases. To ensure that screen updating is not significantly retarding execution speed, limit screen updates to once every 2 or 3 seconds. 3) Use Auto-Save rates that will yield several minutes between stores (10 to 15 recommended). Disk writes are extremely slow and can significantly retard execution times. By allowing several minutes between network stores, you will limit the effect of this feature on Qnet's performance. 4) Augment Qnet's LRC (Learn Rate Control) system to optimize convergence speed. Whenever instabilities become frequent, limit the maximum learning rate that LRC can use. Limiting eta in this way will improve convergence time by eliminating the overhead required to safely damp and recover from instabilities. Repeat this process whenever instabilities begin to occur regularly. 11.0 Qnet Reference This chapter is a reference guide to all menu options and input fields in Qnet. This information is also available as on-line help by using the F1 key on the menu item or input field. 11.1 Main Menu The main menu is the launching point for training and recall runs. You must select a new or existing Qnet network definition file before initiating a run. The menus perform the following functions: File Open/New... Select a new or existing Qnet network definition file from the file selection dialog box. Exit Exit Qnet. Run Run options include: Train Mode Select this item to enter network training mode. Use this option to train both new and existing networks. Recall Mode Select this item to enter network recall mode. Use this option to pass new sets of inputs through existing networks. 11.2 Training Setup Window The training setup window allows you to configure the network size and architecture, specify and view the input files containing the training data, and set the run-time parameters that control network training. If the run is for an existing network, all input fields are initialized to the training parameters from the previous training run. The training setup inputs are discussed below. Problem Name Enter an identifying name for the network. Number of Network Layers Enter the number of layers for the network. Include the input layer, the arbitrary number of hidden layers and the output layer. The minimum value is 3 and the maximum value is 10 (i.e., 1-8 hidden layers). TIP: Networks with more hidden layers and hidden processing nodes require longer training times. For the vast majority of neural models, a design using between 3 and 6 total layers (1 to 4 hidden layers) will be sufficient to solve the problem. Number of Input Nodes Enter the number of input nodes for the network. The number of nodes must correspond to the number of inputs in the network model. For example, if 10 inputs are used to model 3 outputs, the number of input nodes would be 10. Number of Output Nodes Enter the number of output nodes for the network. The number of nodes must correspond to the number of outputs in the network model. For example, if 10 inputs are used to model 3 outputs, the number of output nodes would be 3. Number of Hidden Nodes per Layer Enter the number of hidden nodes for each hidden layer of the network. The number of entries should be [NUMBER OF LAYERS - 2]. The order is from the first hidden layer after the input layer to the last hidden layer before the output layer. Network designs tend to be better when the number of hidden nodes are matched to the size of the problem being modeled. For example, a small problem with 5 input nodes and 100 training patterns would likely do best with 2 to 3 hidden nodes in one hidden layer. A problem with 100 inputs and 5000 training patterns will normally do better with multiple hidden layers and a higher number of processing nodes. Specifying too many hidden nodes can result in poor models that tend to memorize the training set rather than learning relationships. Specifying too few hidden nodes will result in models that can't learn the training data adequately. Some experimentation with the network construction may be required to determine the configuration that offers the best learning characteristics. Number of Iterations Enter the maximum number of iterations to perform in this training session. Training can be manually terminated before the specified number of iterations has been reached. TIP: There is no way to predetermine the number of iterations that will be required to converge a network -- it could take a few hundred or it could take several thousand. Normally, it is best to set the number of iterations to a large value and manually terminate training at the appropriate time. Begin Learn Rate Control Enter the iteration number to begin Learn Rate Control (LRC). Qnet has a special algorithm to control the learning rate, eta. This algorithm will seek the appropriate range for eta during the training run. LRC guards against divergence during training, while attempting to drive eta as high as possible. During the initial training iterations (50-100) for a new network, LRC should normally be turned off. The node weights of new networks are adjusted rapidly during initial iterations and the training error can oscillate wildly. Using LRC during this period is perfectly safe, however, the LRC algorithm may drive eta unnecessarily low in an attempt to eliminate these normal oscillations. FAST-Prop Coefficient The FAST-Prop coefficient controls the algorithm used by Qnet for training. If this coefficient is set to 0 (the default), Qnet will employ its back-propagation algorithm to train the network. If the coefficient is set to a value above 0.0 (to a maximum of 3.0), the FAST-Prop algorithm is used. The closer the coefficient is set to 0.0, the closer FAST-Prop approximates standard back-propagation. While the FAST-Prop training method can often accelerate the learning process, a drawback with this method is that there is a risk that this algorithm will not converge to a minimum error, especially when higher coefficient values are used. For this reason, it is recommended that the training algorithm be switched to the standard back-propagation method at some point during the training process. Likewise, the FAST-Prop algorithm is NOT recommended for long periods of unattended training. Whenever FAST-Prop is being used, the training and test set RMS errors should be monitored closely for signs that the network is no longer converging. If either of these error values begin to increase, it is recommended that the FAST-Prop coefficient be set to 0. Learning Rate (ETA) Eta controls the rate at which Qnet's training algorithms attempt to learn. It determines how fast the node weights are adjusted during training. Eta's valid range is between 0.0 and 1.0. While higher eta's result in faster learning, they can also lead to training instabilities and divergence. When initiating training on a new network, the user must provide a starting eta value. It is better to start conservatively by using a low number. Using a value in the 0.01 to 0.1 range will normally start the training process safely. If the initial guess is too high and the network diverges, reset the network by selecting the "Randomize Weights" from the "Options" menu in the training window. Select "Set ETA" from the same menu and try a lower eta. Qnet's Learn Rate Control (LRC) will help keep eta in its optimal range during training. ETA Minimum Set the minimum learning rate. Qnet's Learn Rate Control will not lower eta below this limit. The default value of 0.001 generally works well. This value may be adjusted during the training process if required. ETA Maximum Set the maximum learning rate. Qnet's Learn Rate Control will not raise eta above this limit. Setting this value in conjunction with LRC will help to avoid instabilities and can result in a significant improvement in convergence times. The maximum value for eta should be adjusted lower whenever training instabilities develop (while LRC is enabled). The upper stable limit of eta can be determined during training by using NetGraph to plot the eta history. A default value of 1 is set for new networks. Momentum Factor (Alpha) Alpha is the learning rate momentum factor used by Qnet's training algorithms. This factor promotes fast, stable learning. The valid range for alpha is 0. to 1. Most network configurations will learn and converge best by setting this value between 0.8 to 0.9 and leaving it there. While this is a good guideline, a different value may work better for some models. Screen Update Rate This value sets the iteration interval at which screen updates occur during the training process. Updating the display every iteration with network training and convergence information provides the best visual monitoring of the training process. Unfortunately, this can negatively affect training times due to the relatively slow process of writing the information to the screen. The optimal update rate to use for monitoring the training activity depends on network size, the number of training patterns and the speed of the computer. If screen updates occur more than once every few seconds, execution speeds are being retarded. Intervals of around 10 generally work well for smaller networks (25 nodes, 100 patterns). Very large models can update the screen every iteration without retarding execution. The test set error (if it exists) is only computed at the screen update interval. To monitor the test data error at a reasonable interval, the report rate should be set to 10 or less. Auto-Save Rate Set the rate at which Qnet's Auto-Save will store the network during training. This value sets the number of iterations between stores. A rate of 100 will store the network every 100th iteration. The network is stored in a temporary file during training. If necessary, these network snap shots can be saved to permanent network definition files. This may be necessary to eliminate overtraining conditions or network divergence problems. It is recommended that the rate be set to a value that will yield network saves once every 10 to 15 minutes. The default rate is 500 iterations per save. Adjust this value during training if necessary. Stored network snap shots can be saved as permanent network definition files using the "File/Save Auto-Save File..." option in the training menu. Auto-Save can be disabled by setting the rate to 0. Initialize/Randomize Weights Node weight values represent the learned information stored in a neural network. New network weight values are initialized randomly prior to training. The "Initialize/Randomize Weights" option is automatically selected for new networks. It can be manually selected to reset a previously trained network to an initial untrained state. Input Node Data File Select the data file from the file selection box. The file must contain the training data for the input nodes. The file should be an ASCII (text) file in columnar format. This is the type of format that most spreadsheets and database applications produce if row/column cell values are written out in text or ASCII mode. Data columns in the file are mapped to the input nodes. Each row represents a separate set of training data (i.e., one training pattern). Spaces, commas or tabs may be used to separate the data columns. Blank rows and rows starting with a "#" are ignored. After a file is selected from the file selection dialog (or whenever the button is the current window object), pressing F2 will cause the file to be read and displayed by Qnet. Qnet's browser is invoked and the file's contents can be reviewed. It is recommended that this be done for new training data files to ensure that they have been correctly formatted for Qnet. Normalize Input Data The training data used for the input nodes must be normalized between 0.0 and 1.0. If the training data is not pre-normalized to this range, select this box and Qnet will perform normalization for all input nodes. The data for each input node will be normalized separately based on the minimum and maximum values for that node. Start Column of Input Node Data Input data must be in a columnar format. Each input node will use one column of data. Data columns may be delimited by commas, spaces or tabs. The value specified in this field is the data column where input node 1 starts. For example, if the network has 50 input nodes and the file contains the input node information in data columns 2 through 51, enter 2 in this field. The input node data columns must be contiguous (i.e., the data cannot be in non-contiguous columns 2, 5, 9, ..., etc.). Target/Output Node Data File Select the data file from the file selection box. The file must contain the target training data for the output nodes. The file should be an ASCII (text) file in columnar format. This is the type of format that most spreadsheets and database applications produce if row/column cell values are written out in text or ASCII mode. Data columns in the file are mapped to the output nodes. Each row represents a separate set of training data (i.e., one training pattern). Spaces, commas or tabs may be used to separate the columns. Blank rows and rows starting with a "#" are ignored. After a file is selected from the file selection dialog (or whenever the button is the current window object), pressing F2 will cause the file to be read and displayed by Qnet. Qnet's browser is invoked and the file's contents can be reviewed. It is recommended that this be done for new training data files to ensure that they have been correctly formatted for Qnet. Normalize Target Data The training data used as targets for the output nodes must be normalized between 0.0 and 1.0. If the training data is not pre-normalized to this range, select this box and Qnet will perform normalization for all output targets. The data for each output node will be normalized separately based on the minimum and maximum values for that node. Start Column of Target Data Target data must be in a columnar format. Each output node will use one column of data. Data columns may be delimited by commas, spaces or tabs. The value specified in this field is the data column where output node 1 starts. For example, if the network has 5 output nodes and the file contains the target information in data columns 11 through 15, enter 11 in this field. The output node data columns must be contiguous (i.e., the data cannot be in non-contiguous columns 11, 15, 19, ..., etc.). Test Set Start Sequence A portion of the training files can be set aside and used for both overtraining analysis and for checking the quality of the neural model. To use part of the data for these purposes, give the starting and ending sequence (the row value) for the range of patterns to use. These patterns will be removed from the training set and the error will be tracked separately. The sequence of patterns (rows) removed from training MUST be at the beginning or end of the training set. If the starting sequence is at the beginning of the training, set the start sequence number to 1. If the sequence of test patterns is at the end of the training set, any number less than the total number of training patterns is valid. If no test data is to be used, set both the starting and ending sequence values to 0. Test Set End Sequence A portion of the training files can be set aside and used for both overtraining analysis and for checking the quality of the neural model. To use part of the data for these purposes give the starting and ending sequence (the row value) for the range of patterns to use. These patterns will be removed from the training set and the error will be tracked separately. The sequence of patterns (rows) removed from training MUST be at the beginning or end of the training set. If the sequence of test patterns is at the end of the training set, use an ending sequence value greater than or equal to the total number of patterns (rows). If no test data is to be used, set both the starting and ending sequence values to 0. Connections... Button Selecting this button will invoke the connection editor for customizing network connections. By default, Qnet networks are fully connected. Fully connected networks will work best for the vast majority of neural models. Train Setup OK Button Selecting OK will begin network training based on the parameters set in this window. IMPORTANT: Selecting OK does not automatically save the network and training setup. To save the setup, select "FILE/SAVE NETWORK STATE" from the training menu. Train Setup CANCEL Button This button will cancel all setup activity and return Qnet to the main window. No entries and changes will be saved and no training will occur. 11.3 Connection Editor The connection editor is used to remove connections from Qnet's default fully connected configuration. Removed connections can also be added back to the network. Connection Editor OK Selecting OK will keep all connection information edited during this session. Connection Editor Cancel Selecting CANCEL will cancel all connection information edited during this session. Remove Connection - From: Layer Enter the layer number of the connection starting point. For example, if the connection from the 2nd node of the input layer to the 3rd node of the first hidden layer is to be removed, enter 1 for the "From Layer". The layer numbers are 1 for the input layer, 2 for the 1st hidden layer, etc. Remove Connection - From: Node Enter the node number of the connection starting point. For example, if the connection from the 2nd node of the input layer to the 3rd node of the first hidden layer is to be removed, enter 2 for the "From Node". Remove Connection - To: Node Enter the node number of the connection ending point. The "To Layer" number is automatically set to the layer following the "From Layer". For example, if the connection from the 2nd node of the input layer to the 3rd node of the first hidden layer is to be removed, enter 3 for the "To Node". Remove Connection Press the "Remove Connection" button after entering the connector information. The connection is removed from the network configuration. (Removed connections will appear in the "Add Connection" list box.) Add Connection List Box Select the connection from the list box and press the "Add Connection" button. Add Connection Button Add the selected connection back to the network model. 11.4 Training Window The training window is used to view, analyze and interact with the current training run. Use the File option to save training information or exit the current run. Interact with the network training parameters by using the OPTIONS menu. Check the training progress with Qnet's NetGraph and browse tools. Selecting any of these options will temporarily suspend network training as indicated on the status bar. Select "RESUME TRAINING" to restart network training. The information displayed in the training window provides details on the network model, the current training parameters and the training results. The Network Info group displays the network's name, the number of network layers, the number of input nodes, the number of output nodes, the total number of hidden nodes, the number of network connections, the number of training and test patterns, and the network size in bytes. The Input Controls group displays the maximum number of iterations for the run, the LRC start iteration, the FAST-Prop coefficient, the eta minimum and maximum settings, the momentum factor and the screen update and Auto-Save rates. The Training Results group contains the current iteration, the training and test set RMS errors, the current eta value, the connections per second benchmark that shows your network computational speed, and the percent complete and time remaining, assuming the network trains to the maximum number of iterations. All fields in this window are "display only". The menu bar is used to perform specific tasks and to change specific training parameters. File File options include: Save Network State Save the current network setup and training results in the existing (default) or in a new ".net" file. THIS OPTION MUST BE SELECTED TO SAVE BOTH SETUP AND TRAINING RESULTS. IF A FILE IS NOT SAVED, LOSS OF ALL SETUP AND TRAINING ACTIVITY WILL RESULT! Save Net Output/Targets Save the network outputs and targets in row/column format. The output file format is ASCII (text) with tab column separators. The format is: ... ... etc... Save Error History The training data's error history is kept in a temporary file during the training run. Use this option to save the error history in a text file. The file contains the training root-mean-square (RMS) training error (or standard deviation of the network response error) for each iteration. Save Auto-Save File Create a permanent network definition file from the Auto-Save snap shots stored during training. Use this option to recover from network overtraining or training divergence conditions. You will be prompted for the file name and the Auto-Save iteration number that should be used to create the new network definition file. To continue training with this new file, restart Qnet and open the file from the main menu. Restart Qnet Return to the Qnet main menu. If the current network state is not saved you will be prompted to confirm your exit. Exit Qnet Exit Qnet. If the current network state is not saved you will be prompted to confirm your exit. Options These items control the learning and training characteristics of the network. Options include: Learning Rate Control On/Off Qnet has a special Learning Rate Control (LRC) algorithm that adjusts eta to ensure the network is converging towards its minimum. Under most circumstances this option should be turned on. Selecting this item will toggle LRC on or off depending on its current state. To adjust or control the network's learning rate manually, turn LRC off. The "Learn Control Start" value displayed in the "Input Controls" section can be used to determine if LRC is active. It is active if the current training iteration is greater than the "Learn Control Start" iteration value. Set Learn Rate Eta Eta controls the rate at which Qnet's training algorithms attempt to learn. It determines how fast the node weights are adjusted during training. Eta's valid range is between 0.0 and 1.0. While higher eta's result in faster learning, they can also lead to training instabilities and divergence. Use Qnet's Learn Rate Control (LRC) to automatically adjust learn rates to account for instabilities. Select this item to adjust LRC's current eta value or to set eta manually during the run. Set Momentum Factor Alpha Alpha is the momentum factor used by Qnet's training algorithms. This factor promotes fast, stable learning. The valid range for alpha is 0. to 1. Most network configurations will learn and converge best by setting this value between 0.8 to 0.9 and leaving it there. While this is a good guideline, a different value may work better for some models. Set Minimum Learn Rate Value Select this item to set the minimum learning rate. Qnet's Learn Rate Control will not lower eta below this limit. Set Maximum Learn Rate Value Select this item to set the maximum learning rate. Qnet's Learn Rate Control will not raise eta above this limit. Setting this value in conjunction with LRC will help to avoid instabilities and can result in a significant improvement in convergence times. The maximum value for eta should be adjusted lower whenever training instabilities develop (while LRC is enabled). The upper stable limit of eta can be determined during training by using NetGraph to plot the eta history. Set FAST-Prop Coefficient The FAST-Prop coefficient controls the algorithm used by Qnet for training. If this coefficient is set to 0 (the default), Qnet will employ its back-propagation algorithm to train the network. If the coefficient is set to a value above 0.0 (to a maximum of 3.0), the FAST-Prop algorithm is used. The closer the coefficient is set to 0.0, the closer FAST-Prop approximates standard back-propagation. While the FAST-Prop training method can often accelerate the learning process, a drawback with this method is that there is a risk that this algorithm will not converge to a minimum error, especially when higher coefficient values are used. For this reason, it is recommended that the training algorithm be switched to the standard back-propagation method at some point during the training process. Likewise, the FAST-Prop algorithm is NOT recommended for long periods of unattended training. Whenever FAST-Prop is being used, the training and test set RMS errors should be monitored closely for signs that the network is no longer converging. If either of these error values begin to increase, it is recommended that the FAST-Prop coefficient be set to 0. Randomize Weights Select this item to reset network weight values to a randomly initialized state. All previous training will be lost. This option may be selected to reset networks that have diverged. If necessary, use a lower learning rate and temporarily turn Learn Rate Control off before resuming training. Iterations Change the maximum number of iterations to use in this run. This can only be changed before reaching the previously set maximum iteration count. Auto-Save Rate Set the rate at which Qnet's Auto-Save will store the network during training. This value sets the number of iterations between stores. A rate of 100 will store the network every 100th iteration. The network is stored in a temporary file during training. If necessary, these network snap shots can be saved to permanent network definition files. This may be necessary to eliminate overtraining conditions or network divergence problems. It is recommended that the rate be set to a value that will yield network saves once every 10 to 15 minutes. The default rate is 500 iterations per save. Adjust this value during training if necessary. Stored snap shots can be saved as permanent network definition files from the "File/Save Auto-Save File..." option. Auto-Save can be disabled by setting the rate to 0. Screen Update Rate This value sets the iteration interval at which screen updates occur during the training process. Updating the display every iteration with network training and convergence information provides the best visual monitoring of the training process. Unfortunately, this can negatively impact training times due to the relatively slow process of writing the information to the screen. The optimal update rate to use for monitoring the training activity depends on network size, the number of training patterns and the speed of the computer. If screen updates occur more than once every few seconds, execution speeds are being retarded. Intervals of around 10 generally work well for smaller networks (25 nodes, 100 patterns). Very large models can update the screen every iteration without retarding execution. The test set error (if it exists) is only computed at the screen update interval. To monitor the test data error at a reasonable interval, the report rate should be set to 10 or less. NetGraph Use Qnet's graphing tool, NetGraph, to view the current run's training error history, the test set error history, the learning rate (eta) history and to training targets to network predictions using several formats and view the input node data. NetGraph's AutoZOOM feature can be used to instantly zoom into any portion of the plotted data. Any graph can be saved as a PCX file for use in drawing programs, word processors, etc. The keystrokes used in NetGraph are: Help in NetGraph. Toggle AutoZOOM on and off. Reset a zoomed plot. Create a PCX image of the graph (DOS). ALT+PRINT SCREEN Send the current graph to the Clipboard (Windows). DOUBLE CLICK LEFT MOUSE BUTTON Move legend if present. DRAG/LEFT MOUSE BUTTON Define zoom area for AutoZOOM. ALT+F4 Close Plot. AutoZOOM works by pressing and then dragging the mouse with the left mouse button held down to mark the area to zoom in on. The zoom box should be marked from the upper left to the lower right corner. Once the zoom box is set, the plot is redrawn to view the area selected. Reset a zoomed plot to the originally drawn scale by pressing the key. Plot Error History Plot the run's training error history. The error is the root-mean-square (RMS) error between training set targets and network outputs. TIP: Use NetGraph's AutoZOOM to better view any portion of the error history. Oscillations in this parameter can sometimes make it difficult to view the error history with the original scale. Plot ETA History Plot the Learning Rate history for this run. If LRC is on, it is often helpful to examine the eta history. Use the plotted information to determine the maximum stable learn rate the network can tolerate. Limiting eta to a maximum stable value will promote faster training. Plot Targets vs. Net Outputs - Scatter Plot the network outputs versus the target values in a scatter plot format. This graph format quickly compares how well all outputs and targets agree for all training patterns. The values plotted are the normalized (between 0 and 1) targets and outputs. The closer all plotted points fall on the line Y=X, the better the agreement between network output and training targets. Plot Targets/Net Outputs - Serial Plot the target values and network output data versus the input pattern sequence number. Both training and test data are plotted on the graph. Separate graphs are generated for each output node. NetGraph will cycle through all nodes or allow you to select a specific node to plot. Plot Targets Minus Net Outputs Error - Serial Plot the difference between the target and network output data versus the input pattern sequence number. Separate graphs are generated for each output node. NetGraph will cycle through all nodes or allow you to select a specific node to plot. Plot Input Nodes Plot the network inputs versus the input pattern sequence number. A separate graph is generated for each input node. NetGraph will cycle through all nodes or allow you to select a specific node to plot. Plot Test Data Error History Plot the test data's error history. The error is the root-mean-square (RMS) error between network outputs and targets in the test set. Use this option to examine how well the network model predicts cases outside the training set and determine the overtraining status. If the test data's error is declining, constructive learning is occurring. TIP: Use NetGraph's AutoZOOM to better view portions of the test data's error history. Oscillations in this parameter can sometimes make it difficult to view the error history with the original scale. Combination Error Plot Plot both the training set and test set error histories. Plotting the two RMS error histories together can be useful in determining the overtraining status of a network. Browse The browse menu allows you to view network outputs and target data, error per node information and network weights. The information in the browser may be sent to a printer (any printer defined to LPT1, LPT2, or LPT3) or saved to a file by selecting the FILE option. Scrolls one page up. Scrolls one page down. Scroll to top. Scroll to bottom. Scroll one line up. Scroll one line down. Browse Network Outputs and Targets Select this item to browse network outputs and training targets. Browse Node Error Data Select this item to view error information for each node. Average and maximum errors at each output node are displayed. Browse Weights View the node weights and the current correction factors being used to adjust them. At the nodes where connections have been removed, the weights are set to zero. Resume Training If network training activity is halted by selecting one of the menu items, select this menu item to resume training. Training is halted so that you can perform any necessary analysis while the network is frozen. The help bar indicates the training status. 11.5 Recall Setup Window Use the recall setup window to specify the files containing the inputs and targets (if available) to be processed through a trained network. Input Node Data File Select the data file from the file selection box. The file must contain the recall data for the input nodes. The file should be an ASCII (text) file in columnar format. This is the type of format that most spreadsheets and database applications produce if row/column cell values are written out in text or ASCII mode. Data columns in the file are mapped to the input nodes. Each row represents a separate set of training data (i.e., one recall pattern). Spaces, commas or tabs may be used to separate the data columns. Blank rows and rows starting with a "#" are ignored. NOTE: Data is automatically normalized to the ranges used for the training run if auto-normalization was selected for training. After a file is selected from the file selection dialog (or whenever the button is the current window object), pressing F2 will cause the file to be read and displayed by Qnet. Qnet's browser is invoked and the file's contents can be reviewed. It is recommended that this be done for new recall data files to ensure that they have been correctly formatted for Qnet. Input Node Data Start Column Input data must be in a columnar format. Each input node will use one column of data. Data columns may be delimited by commas, spaces or tabs. The value specified in this field is the data column where input node 1 starts. For example, if the network has 50 input nodes and the file contains the input node information in data columns 2 through 51, enter 2 in this field. The input node data columns must be contiguous (i.e., the data cannot be in non-contiguous columns 2, 5, 9, ..., etc.). Target/Output Node Data File (Optional) Select the data file from the file selection box. The file must contain the target recall data for the output nodes. The file should be an ASCII (text) file in columnar format. This is the type of format that most spreadsheets and database applications produce if row/column cell values are written out in text or ASCII mode. Data columns in the file are mapped to the output nodes. Each row represents a separate set of recall data (i.e., one recall pattern). Spaces, commas or tabs may be used to separate the columns. Blank rows and rows starting with a "#" are ignored. After a file is selected from the file selection dialog (or whenever the button is the current window object), pressing F2 will cause the file to be read and displayed by Qnet. Qnet's browser is invoked and the file's contents can be reviewed. It is recommended that this be done for new recall data files to ensure that they have been correctly formatted for Qnet. Target/Output Node Data Start Column (Optional) Target data must be in a columnar format. Each output node will use one column of data. Data columns may be delimited by commas, spaces or tabs. The value specified in this field is the data column where output node 1 starts. For example, if the network has 5 output nodes and the file contains the target information in data columns 11 through 15, enter 11 in this field. The output node data columns must be contiguous (i.e., the data cannot be in non-contiguous columns 11, 15, 19, ..., etc.). Recall Setup OK Button Selecting OK will cause the selected input patterns to be processed through the trained network. Recall Setup CANCEL Button Selecting CANCEL will return you to the main menu. No entries will be saved. 11.6 Recall Window Qnet's recall window is used to analyze network responses to the set of inputs. Network inputs and outputs can be analyzed with NetGraph and Qnet's browse tool. Outputs (and targets) can be saved to a file. Network construction and system utilization information are displayed in the recall window. File File options include: Save Net Output/Targets Save the network outputs and targets (if available) in ASCII (text) row/column format with tabs used for data column delimiters. The format is: ... ... etc... Exit Qnet Exit Qnet. Restart Qnet Return to Qnet's main menu. From there you can select a new Qnet network definition file or start a new training or recall run. NetGraph Use Qnet's graphing tool, NetGraph, to view various comparisons of target and network output data or view the input node data. NetGraph's AutoZoom feature can be used to instantly zoom into any plotted data. Any graph can be saved as a PCX file for use in drawing programs, word processors, etc. The keystrokes used in the NetGraph are as follows: Help in NetGraph. Toggle AutoZOOM on and off. Reset a zoomed plot. Create a PCX image of the graph (DOS). ALT+PRINT SCREEN Send the current graph to the Clipboard (Windows). DOUBLE CLICK LEFT MOUSE BUTTON Move legend if present. DRAG LEFT MOUSE BUTTON Define zoom area for AutoZOOM. ALT+F4 Close Plot. AutoZOOM works by pressing and then dragging the mouse with the left mouse button held down to mark the area to zoom in on. The zoom box should be marked from the upper left to the lower right corner. Once the zoom box is set, the plot is redrawn to view the area selected. Reset a zoomed plot to the originally drawn scale by pressing the key. Plot Targets vs. Net Outputs - Scatter (Targets Required) Plot the network outputs versus the target values in a scatter plot format. This graph format quickly compares how well all outputs and targets agree for all training patterns. The values plotted are the normalized (between 0 and 1) targets and outputs. The closer all plotted points fall on the line Y=X, the better the agreement between network output and training targets. Plot Targets/Net Outputs - Serial Plot the network outputs and the target values (if available) versus the input pattern sequence number. A separate plot is generated for each output node. NetGraph will cycle through all nodes or allow you to select a specific node to plot. Plot Input Nodes Plot the network inputs versus the input pattern sequence number. A separate plot is generated for each input node. NetGraph will cycle through all nodes or allow you to select a specific node to plot. Browse View network output and target data or error per node information (if targets are available). The information in the browser may be sent to a printer (any printer defined to LPT1, LPT2, or LPT3) or saved to a file by selecting the FILE option. Scrolls one page up. Scrolls one page down. Scroll to top. Scroll to bottom. Scroll one line up. Scroll one line down. Browse Node Error Data View error the error summary between network outputs and targets (targets required for this option). Browse Network Outputs and Targets This item allows you to view network outputs and targets (if available). 12.0 Source Code for Network Recall (NOT INCLUDED W/DEMO) The source file NETSOLVE.C can be used to incorporate trained networks into your own C or C++ applications. Only three functions need to be called to perform network recall: 1. NETDEF *init_net( char *NetFileName ); 2. float *net_output( NETDEF *net, float *inputs ); 3. void free_net( NETDEF *net ); The routine, init_net, is used to initialize and allocate the neural network defined in the Qnet network definition file passed through the argument list. The routine returns a pointer to the C structure that defines the network. The user does not need to allocate or access members of this structure, simply define a pointer to NETDEF and init_net allocates and initializes the needed information. The routine net_output computes the network output. Arguments passed are the pointer to NETDEF and a pointer to a float array that contains 1 sequence of input patterns. The order of values in the input array must be the same order that was used to train the network. The net_output routine returns a pointer (allocated by net_output) to an array of floats that contain network outputs. At this point, you may write, save, or do whatever with the outputs. If you need to call net_output multiple times, simply loop through the desired number of calls, changing the "inputs" array before each call. Remember to free the outputs array returned from net_output when finished with the values. The free_net routine frees all memory allocated for the network and should be called when finished with the network. A template of how a trained network is incorporated into sample C code is shown below. #include #include #include "netsolve.h" #define NINPUTS 10 //number of input nodes for network main() { float inputs[NINPUTS], *outputs; NETDEF *network; // INITIALIZE NETWORK network = init_net("C:\QNET\NETWORKS\TRAINED.NET"); // If multiple input cases for recall, add loop here! // Read/set input values into float inputs[] array. The number // of inputs must be equal to the number of input nodes // and they must be in same order that was used for // network training. // COMPUTE OUTPUTS outputs = net_output( network, inputs ); // Write or use outputs. The number of output values is // equal to the number of output nodes. The order is the // same as the target data used for training. // FREE OUTPUTS free(outputs); // End Loop for multiple inputs here! // FREE NETWORK STORAGE free_net(network); } Appendix A - Example Neural Network Applications Hands-on experience is required to gain expertise in creating and training neural network models. To get you started, some sample neural network models are included in the NETWORKS subdirectory to the installation path. Each sample contains both an untrained network and a trained network. It is strongly suggested that new Qnet users run through some or all of these examples to gain familiarity with Qnet and the concepts important to training neural networks. Even if you don't fully understand the example problem, going through the training process will be beneficial to your understanding of Qnet's options and features. Artificial Intelligence: Optical Character Recognition This example is designed to illustrate the effectiveness of using Qnet to develop optical recognition applications. The optical character recognition (OCR) field is growing rapidly due to the proliferation of computer FAX modems and optical scanners. OCR software allows the user to turn scanned or faxed graphic images into usable text. Advanced OCR applications have recently started to employ neural networks to perform the character recognition task. Neural OCR models can greatly improve the translation accuracy over less sophisticated methods. We will set up a small example to show how a neural network can be designed to recognize characters. The numbers 0 through 9 will make up the character set for this model. A full featured OCR program can normally recognize 100 or more characters and symbols. The characters in this example will consist of 8x8 bitmap images. This means that a total of 64 bits will be used to draw the image of each number (8 bits across by 8 bits down). Several different images (or font types) will be used for each number so that we can teach our neural network a variety of possible number types. For example, there is more than one way to draw the number 4 and we want the neural model to successfully handle the different possibilities. The training set, therefore, will consist of multiple bitmap images of the numbers 0-9. The test set for this problem will consist of number images slightly different from the numbers in the training set. This will enable us to determine how well the network has learned to generalize the differences between each number. If the network can only recognize character images that exactly match the ones in the training set, the model would not be useful in an OCR application that must process many different and imperfect character images. Some of the bitmaps for numbers 1 through 5 are shown on the following page. The test cases used for numbers 1 through 5 are also shown. For each number, 64 inputs are generated for our neural model. Every bit that is turned on in a character's bitmap pattern has a value of 1 and each bit that is off has a value of 0. An input array of 64 1's and 0's will make up each character used in the training (and test) set. As a result, the network design for this model must consist of 64 nodes in the input layer. The output layer has been designed with 10 output nodes -- one node for each of the characters we wish to recognize. When a number is recognized from a set of inputs, the network will output a 1 at the appropriate output node. For this model, when the number 1 is recognized the first output node will be 1, the second output node will be 1 when the number 2 is recognized and so on (see section 7.3 for a discussion of binary mode output). The hidden structure has 3 layers containing 10 nodes each. Adequate learning was not obtained when smaller hidden structures were used. Data normalization is not necessary for this model since all input node data and training targets use a binary representation. The training set is contained in the file OCR.DAT. The file contains 68 total patterns (58 used for training and 10 used for the test set). Data columns 2 through 11 contain the training targets and 12 through 75 contain the input node bitmap data. The Qnet network files OCR.NET (the untrained network) and OCR_.NET (the trained network) are available to run. Use NetGraph to visually analyze the quality of agreement between the model's output response and the training targets. The training results indicate that the Qnet neural model easily learned to correctly classify the bitmap images of the training set. Each of the 10 test cases were also correctly recognized by the network. The test cases recognized with a high degree of accuracy and confidence were numbers: 1 (.9970), 2 (.9987), 3 (.9744), 4 (.9982), 6 (.9983), 9 (.9985) and 0 (.9971). The test cases for 5 (.6926), 7 (.5645) and 8 (.7444) did not show the same high degree of confidence. The values enclosed in parentheses indicate the output strength for that number's predicting node. Ideally, a value of 1 (or near 1) will be output by the correct node when a number has been recognized. The test cases that produced output values significantly below 1 indicate that the recognition was not as strong. Though, for all test cases, the node with the strongest output response did properly identify the correct number. An interesting feature to note while training this model is that there is a long period where the test set error increases. This would appear to indicate that some memorization of the training set is taking place and that constructive learning is at a minimum during this time. Another possibility is that this behavior is simply an aberration due to the small size of the test set (10 patterns). In either case, significant constructive learning eventually takes hold and the test set error begins to descend. No global minimum was found in the test set, indicating that overtraining is not a problem with this model. To build a character recognition model for a full featured OCR application, the neural net model must be significantly larger than our sample shown here. There would likely be 100 or more output nodes to properly classify most of the common characters. The 8x8 bitmap pattern used to represent a character in this model is too small. A better representation would be 16x16 or more. This would require a minimum network size of 256 input nodes and 100 output nodes. If 10 to 20 different font types were used in training, along with imperfections in these fonts (i.e., slightly rotated or non-centered), we would likely have a training set of 5000 to 10000 patterns. To adequately learn all this information, a large complex hidden structure would be necessary. This example shows the need for a powerful neural model generator like Qnet. Real-world applications will often require a significant amount of memory, high speed processing and neural modeling software designed to handle these demands. Scientific: Digital Filter This example illustrates how a back-propagation neural network can be used for noise removal from a time series signal. A simple back-propagation filter will be created to remove undesired noise from an input signal. For this example we'll use a modulated high frequency signal contaminated by low frequency noise that we wish to eliminate. Traditional linear signal processing would employ a high-pass filter of the following form: S(k+n/2) = F[ x(k), x(k+1), ...., x(k+n) ] where S is the resulting filtered signal, k is the kth point in the time series and n is the number of samples being processing to compute the filtered point. F is the filter. We'll model the back-propagation high-bandpass filter with the same inputs that are used in linear signal processing theory. We will use 11 input nodes (i.e., n = 11) to filter the contaminated signal and one output node to represent the filtered signal (analogous to S(k+5)). The input training data will consist of a contaminated signal and a "clean" signal. The signals have been formatted for Qnet input in the file FILT.DAT. A partial view of the contaminated signal is shown below. Use NetGraph to view the "clean" signal used for the target data. There are 1490 patterns contained in the input file. The training set consists of the first 1099 patterns and the overtraining test set uses the last 391 patterns. Five layers are used in a fully connected network with 11 input nodes; 20, 15 and 10 hidden nodes; and 1 output node. The network files FILT.NET and FILT_.NET are available for examination. The FILT_.NET file contains a trained network and FILT.NET contains the untrained network definition. First, run the converged network and see the excellent job the network does filtering the larger amplitude low frequency noise from the desired high frequency signal. The results are best seen by plotting the network outputs and targets serially. Note, that even though the last 391 input patterns were not used for training, they show virtually the same degree of agreement with the targets as do the patterns used for training. It should also be noted that this test set is not a random sampling of the training set. It contains only a subset of the high frequencies contained in the training set. This increases the probability that false minimums will be seen in the test set error during training. We concluded training of this network prior to finding a global minimum in the test set error. It is possible that one does not exist for this example. Try training the untrained network. Plot, adjust and control the training parameters as needed to improve training and convergence. This network takes considerable time to train. Overnight, unattended training will be necessary to reach the level of convergence shown in the trained example. Use the trained network and the FILT.DAT file in recall mode to familiarize yourself with recall mode options. The starting columns in the FILT.DAT file are 2 for the input nodes and 13 for the output node. Back-propagation filters have been shown to be an excellent alternative to traditional digital filters. Currently, much theoretical and experimental investigation is taking place to determine the benefits of nonlinear signal processing offered by back-propagation neural networks. Scientific: Data Analysis This is an example designed to show how the scientist, researcher or engineer can use neural networks to analyze and model research data. A scientist gathers experimental data for some process or phenomenon in an attempt to predict and understand its behavior. It is also common that the theoretical model available to predict some phenomenon is either very inaccurate, extremely complex or just poorly understood. For this problem, a model is needed to predict the aerodynamic drag on a sphere for a given range of flow conditions. Let's say the aerodynamic drag of a sphere is needed for air speeds ranging from Mach .3 to Mach 4.5 and for Reynolds numbers from 200000 to 600000. From fluid dynamics theory we know that these are the two critical factors that influence aerodynamic drag. The researcher runs a series of wind tunnel test to gather the data. For a sphere, the experimental drag data contains peaks, valleys and discontinuities (due to the onset of turbulence at the critical Reynolds numbers and Mach effects). The traditional method of using the experimental data to later predict sphere drag involves the use of slow, complex bivariate, table lookup techniques combined with various interpolation schemes. If the experimental data is gathered at non-uniform Mach and Reynolds numbers, these traditional techniques become quite difficult to implement. Neural networks resolve all these concerns quickly and easily. When the network is properly formulated, it can easily take on even the roughest interpolation problems. Nonlinear behavior, data discontinuities, and non-uniform data samples can all be handled by incorporating a back-propagation neural network. A trained network will give the researcher the means to predict results that are computationally faster, more accurate and require less data storage than traditional table lookup/interpolation methods. The example is contained in SPHERE.NET (the untrained network) and SPHERE_.NET (the trained network). The network model has 6 fully connected layers. The input layer has 2 nodes, one for the Reynolds number (divided by 10^5) and the other for the Mach number. The 4 hidden layers contain 6 nodes, 5 nodes, 4 nodes, and 3 nodes respectively. The output layer has 1 node that represents the aerodynamic drag on the sphere. A total of six layers were used in this network in an attempt to better model the nonlinear behavior of the experimental data. We have 102 experimental cases to train the network. Instead of incorporating a test set, the resulting network will be interrogated by passing a large number of test cases through the network to visually analyze the results. The results obtained from the network are excellent. The sphere drag data is accurately predicted for the training cases. The average error of the fit is less than 1%. The figure shown below is the result of passing 378 Mach/Reynolds number conditions through the network in recall mode. The resulting plotted surface models the drag trends extremely well, including the discontinuity that represents the onset of turbulence. Analyze the trained network by viewing the target/network output comparisons with NetGraph. You may also run the data through recall mode by using the training data in SPHERE.DAT as the input data. Columns 1 and 2 contain the Reynolds number and Mach number input node data. The target sphere drag data is in the 3rd column. Converge the untrained network to gain better insight into the convergence process. You will notice that it takes this network a very long time to get organized. Beginning the training process with the FAST-Prop method will significantly increase the rate of learning during the initial training period. The standard back-propagation method takes several thousand iterations before the training error begins to decrease. This is common for networks with multiple hidden layers. A back-propagation neural network is the perfect tool to model complicated research data. Creating a neural model will provide the researcher with a compact, fast prediction tool for utilizing virtually any test data needed for research and design activities. General: Random Number Memorization To show the use and possible misuse of neural networks we'll present a problem where we introduce 1000 random numbers to a network in an attempt to predict 2 random numbers. Obviously, one set of truly random numbers cannot be used to predict another set of random numbers. A neural network, however, can memorize the training set and map the input random numbers to the random outputs for the patterns presented to it (assuming the number of training patterns is small relative to the network size). We will also monitor a test set during training to determine if any constructive learning does take place with patterns that are not used for training. (If so, we can only assume that our random number generator is not generating truly random numbers.) This example uses a set of 10 input training patterns. Each pattern contains 1000 random values for inputs and 2 random values for output (all normalized between 0. and 1.). The network is fully connected with three layers containing 1000 input nodes, 2 hidden nodes and 2 output nodes. Two input patterns have been reserved for the test set. The training data is contained in the file RANDOM.DAT. Two network input files RANDOM.NET and RANDOM_.NET are available for examination. The file RANDOM_.NET contains a network where training has taken place and RANDOM.NET contains the untrained network definition. The trained network easily mapped the input random numbers to the output random numbers. The network memorized the training set for the 10 training cases presented to it. Once the trained network sees the 1000 random input numbers it can produce 2 output random numbers. This memorization, however, produces no constructive learning. The mapping relationship developed is only valid for the training set. The test set never displayed any decrease in the RMS error. For problems where all possible cases are contained in the training set, memorization is fine and a productive neural network can be produced. Problems that require some type of predictive capability do not benefit from training set memorization. The only way to determine whether the network is constructively learning or memorizing the training data is to incorporate an overtraining/test set of data to monitor the training results. One factor that determines whether a network will memorize information, is the relative size of the network versus the number of training patterns. For a case like this with only 10 training patterns, memorization of the training set becomes likely. If we would have used 500 input training patterns, it is doubtful that any memorization would have taken place (unless the network size were increased substantially). Investing: S&P 500 Forecaster This example investigates the viability of using neural networks for developing a financial market forecasting tool. Academic arguments have long been waged as to whether the stock market is a "random walk" with virtually unpredictable trends or whether these trends can be forecast and followed in the pursuit of profit. We will propose a very simplistic neural network model that uses historical market prices to predict future prices. If the market is a true "random walk" we can expect to see results similar to the "RANDOM" example. For that example, the neural model was able to memorize the training set, but it could not respond accurately to inputs that were not part of the training set. If certain historical trends can be used to predict future prices, the neural model should be able to improve its response to test set cases, unlike the "RANDOM" model. This simple market forecasting model will be formulated by looking at weekly open-high-low-close data. It will attempt to forecast the next week's market close from the previous 5 weeks open-high-low-close information. Downloaded S&P 500 Index data has been processed from its raw index form to a percentage change format (a few minutes of manipulating the data in a spreadsheet did the trick). The percentages are relative to the current week's closing value. The current and previous weeks' open-high-low-close data are used as inputs and the following week's closing value is used as the target. All values are expressed as a percentage gain or loss relative to the current week's close. (Section 7.2 discusses the reason for using the percentage change format.) The training patterns consist of 243 weeks of data, from January 1988 through August 1992. The first 216 weeks are presented to the network for training and the last 27 weeks are reserved for the test set. The network contains 4 fully connected layers. The input layer consists of 19 nodes for the five weeks of open-high-low-close information (the current week's close is not used since this value is always a 0% change). The two hidden layers contain three nodes each. The output layer has one node for predicting next week's close. The size of the network is kept small for this example to minimize the risk of memorization. The network definition files are SP500.NET (the untrained network) and SP500_.NET (the trained network). The training data is in SP500.DAT. The trained network was returned to the point where overtraining started through the Auto-Save feature. We took the training well past this point in the original run to validate that this was a true global minimum for the test set error. After training the network, one can conclude that some constructive learning does indeed take place. This supports the notion that the stock market is not a just a "random walk" and that historical trends can be beneficial in determining future prices. During training, both training and test set RMS errors decline. The RMS error for the test set declines for several thousand iterations. At the point overtraining starts, the assumption can be made that training set memorization is the predominant learning mode. The graphs on the previous page depict the results of our simple model. The first plot shows the results for all 247 training and test patterns. The second plot shows the results for just the test patterns. It is encouraging to see that both training and test set plots show trends that are very similar. The X axis is the model's predicted change for next week (up or down - only the magnitude is represented). An X axis value of 0.20% would include all predicted changes greater than or equal to this number. Two curves are plotted: 1) the percentage of cases that the model is correct (direction-wise only) and 2) the percentage of weeks the model predicts a change at least that large. If we were to trade only when the model predicts a 0.5% move or larger, the test patterns graph shows that we would be trading about 30% of the weeks and we could expect to be correct about 75% of the time. The graph for all patterns reveals that we would be trading about 45% of the weeks and that we would still be correct about 75% of the time. By comparison, a buy-and-hold strategy would be correct only on up weeks, which are 58% of the weeks for the period that our model covers. The model would seem to beat the buy-and-hold strategy by a significant margin. This is encouraging given the fact that the model is very simplistic in nature. We are NOT recommending the use of this network for market trading. It is offered for demonstration purposes only! If you wish to develop investing and trading models with Qnet, we strongly urge you to use far more training information than is used in this example. Large test sets must be used to accurately validate any model developed. Our test set in this model uses only 27 weeks (or patterns). The small size may make the results shown here a bit unreliable. However, this model does demonstrate the usefulness of neural networks as a decision making tool for investing and trading. Investing: Futures - Fair Value This example attempts to create a neural model to predict the theoretical fair value of a futures contract. For this example, we'll use the S&P 500 stock futures contract. Investment theory tells us that any futures contract has a set "fair value" based on the risk-free interest rate and the time to contract expiration. If the futures contract deviates from its fair value, arbitrage opportunities (coined "program trading") will exist and a risk-free return that is higher than the current market rates can be guaranteed. This is accomplished by buying and selling equal amounts of "stock" in the futures and cash markets to create a riskless neutral position. If this is done when the futures contract is at its "fair value", the risk free rate of return will be realized. Otherwise, higher rates of returns can be achieved. This action serves as method of keeping both the futures and cash (stock) markets in a closely coupled relationship. Instead of studying investment theory and writing a computer program to compute the "fair value" let's see how we can use Qnet to develop a model. Two inputs will be used to create this model: 1) the current short-term risk-free rate of return (the return on three month Treasury Bills are used here) 2) the days to expiration of the contract. The target data for the output node will consist of the ratio: target data = (S&P 500 futures price)/(S&P 500 cash price). One and a half years of interest rates, S&P 500 futures data and S&P 500 index data (daily closes) will comprise the training set. A little manipulation of downloaded data in a spreadsheet was all that was necessary to create the input data training file. To create a more robust model, one should include more than 1 1/2 years of data. It is important to cover a wide range of historical interest rates. It would also be wise to incorporate a test set for model verification and overtraining analysis (not done in this example). The network contains three fully connected layers. There are 2 input nodes, one for the short term interest rates and one for the days to contract expiration. The hidden layer contains 5 nodes and the output layer has one node for the fair value ratio. The networks are contained in the SPFUT.NET (the untrained network) file and the SPFUT_.NET (the trained network). The training data is in SPFUT.DAT. Run the trained network in both training and recall modes. Try recall mode by using the training data as the input and target node data (the input node data starts in column 1 of SPFUT.DAT and the output node target data starts in column 3 of SPFUT.DAT). Use the plotting and browse options available in both training and recall modes to examine the results. Two things are evident from plotting the network outputs and targets. First, Qnet does an excellent job of modeling the fair value ratio for the S&P 500 futures contract. Second, arbitrage opportunities do exist between the cash and futures markets. Up to a half point of interest and more can be achieved in single day (we are only viewing daily closes with this model). It's not surprising that many large trading institutions take advantage of these divergence's to lock in large risk-free returns. Appendix B - Hardware/Software Compatibility CPU Compatibility - DOS Version Qnet requires a 386 CPU or higher in conjunction with a 387/487 math coprocessor (486DX CPU's have a coprocessor built-in). Some of the first 386 chips have an internal error that causes problems with protected mode applications. Although reports of this problem have been exceedingly rare, symptoms include unexplained and unpredictable crashes. If this type of problem is encountered with an early 386 machine and cannot be duplicated on a newer PC, it can most likely be resolved by installing a newer 386 chip. Video Hardware Compatibility - DOS Version Qnet requires VGA compatible graphics hardware to run properly. Qnet automatically senses the video adapter and attempts to set the required video mode. For all VGA class displays and higher, Qnet attempts to use the standard VGA 640x480 resolution with the maximum number of colors that the video driver supports for the installed hardware. The 640x480 resolution modes offer the best screen readability and window sizing characteristics. If your display is not compatible with this standard VGA resolution or Qnet's auto-sensing is not functioning correctly for your adapter, you may add a parameter to your DOS environment that forces a video mode that is compatible with your hardware. The easiest way to set this environment variable is to add the following line to your AUTOEXEC.BAT file: set QNET=ADAPTERMODE where ADAPTERMODE is one of the following: ADAPTERMODE Description FG_VGA11 IBM VGA mode 0x11, 640x480, monochrome FG_VGA12 IBM VGA mode 0x12, 640x480, 16 colors FG_ATI62 ATI mode 0x62, 640x480, 256 colors FG_ATI63 ATI mode 0x63, 800x600, 256 colors FG_DFIHIRES Diamond Flower Instruments, 800x600, 16 colors FG_EVGAHIRES Everex EVGA, 800x600, 16 colors FG_PARADISEHIRES Paradise VGA, 800x600, 16 colors FG_TRIDENTHIRES Trident VGA, 800x600, 16 colors FG_TSENGHIRES Tseng Labs ET4000 SuperVGA chip set, 800x600, 16 colors FG_VEGAVGAHIRES Video 7 VEGA VGA board in 800x600, 16 colors FG_TIGA_8_640 TIGA 2.0, TI TMS340-based adapters, 640x480, 256 colors FG_TIGA_8_1024 TIGA 2.0, TI TMS340-based adapters, 1024x768, 256 colors FG_TIGA_8_1280 TIGA 2.0, TI TMS340-based adapters, 1280x1024, 256 colors FG_VESA1 VESA mode 0x101, 640x480, 256 colors FG_VESA2 VESA mode 0x102, 800x600, 16 colors FG_VESA3 VESA mode 0x103, 800x600, 256 colors FG_VESA5 VESA mode 0x105, 1024x768, 256 colors FG_VESA6A VESA mode 0x6A, 800x600, 16 colors FG_VESA7 VESA mode 0x107, 1280x1024, 256 colors If you don't know what FG_XXXXX variables are valid for your hardware, run the utility QNETFGM.EXE included with Qnet for the list of settings that are compatible. IMPORTANT: It is strongly recommended that the resolution of 640x480 be selected whenever possible. While higher resolutions may work fine for your system configuration, two problems may result: 1) Displayed text becomes small and less readable at higher resolutions. 2) For very large models that require extensive virtual memory swapping, Qnet's mouse driver may cause system crashes under certain conditions. This is not a problem in 640x480 resolution mode. Resolutions lower than 640x480 will likely result in Qnet windows that cannot be mapped properly to the display area. Any modes reported by QNETFGM.EXE that are below VGA resolution are not recommended for use with Qnet. Monochrome (2 color) VGA displays are compatible with Qnet, however, certain colored bitmaps used on various buttons will not be visible. Memory Management and EMM Software Compatibility - DOS Version Qnet allocates memory from the system's available extended memory (XMS). To determine the amount of extend memory your system has available, use the DOS mem command. It will display this number as "available XMS memory". Once extended memory has been exhausted, virtual memory is allocated from the system's hard disk. You may control the location of the virtual memory (swap) file. For example, if you have several hard disks on the system, you will want the virtual memory file to be allocated on a drive with both fast access time and adequate space available. To control swap file placement add the following line to your AUTOEXEC.BAT: set TEMP=drive:\path or set TMP=drive:\path the first environment string found (TEMP or TMP) will be used for the swap file location. The default swap file location is the execution directory. Qnet is also compatible with all modern memory managers such as QEMM(TM), 386 Max(TM) and MS DOS's(TM) EMM386.EXE. DOS 5's EMM386.EXE must be configured to make extended memory available. If these products are in use, Qnet allocates extended memory through that source. If there is no memory manager, Qnet enables address line 20 (the A20 line) to directly access the systems extended memory. The A20 control algorithm in Qnet is compatible with all true AT compatibles, PS2(TM) computers and most other system's A20 control schemes. If Qnet detects a problem enabling the A20 line the following error message will appear: Cannot enable the A20 line, XMS memory manager required. If this message appears, it will be necessary to install some type of memory manager such as the MS DOS 5.0 EMM386.EXE or one of the products mentioned above. Do not run Qnet's DOS version under a Windows' DOS session. The virtual environment in a Windows' DOS session can be unstable when running a 32-bit virtual memory application. If you wish to run Windows while training networks, use Qnet for Windows. TSR programs, disk caching utilities and other drivers loaded into high memory will reduce the amount of extended memory available to Qnet. Popular disk caching programs allow you to use some or all of the extended memory as a disk buffer. Because Qnet is a virtual memory application, a large disk cache is counterproductive if it forces Qnet to use of a swap file. To run Qnet at its maximum efficiency, you should configure your system so that adequate extended memory is available to Qnet. Qnet Compatibility and Execution Summary The above discussions list several compatibility issues. The following list is a complete summary of all potential Qnet problems: 1) First generation 386 CPU's may have internal errors that prevent the reliable execution of protected mode applications. 2) Video hardware detection problems may exist with some nonstandard hardware. Use QNETFGM.EXE to determine valid video modes and force the appropriate mode as explained above. 3) If video resolutions higher than 640x480 are used while executing extremely large models that require extensive memory swapping, system crashes may result. 4) Do not run Qnet's DOS version under a Windows' DOS session. Use the Windows version. 5) If the following is displayed when attempting to run Qnet: Cannot enable the A20 line, XMS memory manager required. an expanded memory manager (EMM) must be used on your system. See above for explanation. 6) If NetGraph is used to plot information that is extremely small in scale, delays can occur in producing the plot. NetGraph uses an iterative process to automatically set up the graph's scale. Resolving extremely small scales can require a large number of iterations. 7) On-line help for menu items is not available under the Windows version. Appendix C - Back-Propagation Technical Overview This overview is intended to provide both general information on back-propagation theory and some specific details of Qnet's modeling techniques. While an understanding of specific theoretical details is not required for Qnet, it can provide the initiated user with a more complete overview of Qnet's internal operation. Qnet back-propagation neural networks are multi-layered and feedforward (connections must connect to the next layer) in design. Networks can be fully connected or connections can be removed individually. Removed connections are modeled in Qnet by explicitly setting the connection's receiving weight to 0. This removes the effect of that individual connection on the network's response. New networks have randomly initialized weight values. Each time an initialization is performed a network state will be created that is completely unique. This leads to the possibility that identical training runs with newly initialized networks may exhibit different learning characteristics. However, the converged states of two such training runs will be nearly identical for the vast majority of cases. Back-propagation training is accomplished using the following logic sequence: (NOT AVAILABLE FOR DEMO) Appendix D - Qnet Limits and Specifications Qnet is designed to handle extremely large and complex network designs. Under most circumstances, processing speed and available virtual memory will be the limiting factors for network training sizes. The following table provides a quick summary of Qnet's limitations: Feature 32-bit DOS Version Windows Version Maximum network layers 10 10 Maximum nodes per layer Unlimited 32,000 Maximum training patterns Unlimited 32,000 Maximum input data file record length (bytes) Unlimited 32,000 Practical limits will vary based on your computer's installed system memory and your CPU's performance capabilities. Appendix E - Tech Support If you have questions concerning Qnet and its usage or you wish to submit a problem report, contact Vesta technical support at (708) 446-1655 during the normal support hours between 9AM and 4:30PM (CST) Monday through Friday. Written correspondence Vesta Services, Inc. - QNET 1001 Green Bay Rd, Suite 196 Winnetka, IL 60093