DELIMIT Version 1.1 Copyright 1993 Jefferson P. Carey Delimit is a program that scans columnar reports (stored as ascii text files) and extracts relevant data, writing data to an ascii file in comma-delimited format. A full explanation will follow, but I believe a simple example is the best way to show you the capabilities of Delimit. In a nutshell, Delimit can take a report like this: ----------------------------------------------------------------------------- XYZ Corporation Page: 1 Sales Commission Report For The Month Beginning 01/01/1993 Salesperson ID Item Number Qty Sold Cost Each Item Sales Commission -------------- ----------- -------- --------- ---------- ---------- 8342 981239872 5 74.95 374.75 56.21 987243873 23 14.95 343.85 51.58 989123783 3 274.85 824.55 123.68 ---------- ---------- Totals: 1543.15 231.47 Salesperson ID Item Number Qty Sold Cost Each Item Sales Commission -------------- ----------- -------- --------- ---------- ---------- 8573 981239872 4 74.95 299.80 44.97 987243873 27 14.95 403.65 60.55 989123783 6 274.85 1649.10 247.37 ---------- ---------- Totals: 2352.55 352.89 XYZ Corporation Page: 2 Sales Commission Report For The Month Beginning 02/01/1993 Salesperson ID Item Number Qty Sold Cost Each Item Sales Commission -------------- ----------- -------- --------- ---------- ---------- 8342 981239872 6 74.95 449.70 67.46 987243873 25 14.95 373.75 56.06 989123783 4 274.85 1099.40 164.91 ---------- ---------- Totals: 1922.85 288.43 Salesperson ID Item Number Qty Sold Cost Each Item Sales Commission -------------- ----------- -------- --------- ---------- ---------- 8573 981239872 3 74.95 224.85 33.73 987243873 22 14.95 328.90 49.34 989123783 5 274.85 1374.25 206.14 ---------- ---------- Totals: 1928.00 289.21 Grand Totals: 7746.55 1162.00 ----------------------------------------------------------------------------- and, with VERY LITTLE EFFORT on the part of the user, create a file containing the data from the report in comma-delimited format, like this: "01/01/1993",8342,981239872,5,74.95,374.75,56.21 "01/01/1993",8342,987243873,23,14.95,343.85,51.58 "01/01/1993",8342,989123783,3,274.85,824.55,123.68 "01/01/1993",8573,981239872,4,74.95,299.80,44.97 "01/01/1993",8573,987243873,27,14.95,403.65,60.55 "01/01/1993",8573,989123783,6,274.85,1649.10,247.37 "02/01/1993",8342,981239872,6,74.95,449.70,67.46 "02/01/1993",8342,987243873,25,14.95,373.75,56.06 "02/01/1993",8342,989123783,4,274.85,1099.40,164.91 "02/01/1993",8573,981239872,3,74.95,224.85,33.73 "02/01/1993",8573,987243873,22,14.95,328.90,49.34 "02/01/1993",8573,989123783,5,274.85,1374.25,206.14 What's the point? In many organizations, out-of-date, unfriendly, and inflexible computer systems still prevail. Many of these systems are capable of outputting a variety of highly informative reports (such as the one shown above), but lack the capability to be easily customized to provide other types of data output. Essentially, the data exists in the system, but you can only see it presented in ways that the system designers intended (unchangeable reports). In many cases the data presented in such reports could be of even greater value if it could be extracted and analyzed using software (databases, spreadsheets, etc.) on a PC. Herein lies the value of Delimit. Any modern data analysis PC software that's worth a dime can import data from a comma-delimited ascii file (exactly the kind of file created by Delimit). Once the data is available to the PC software, the possibilities for analysis (and even new reports) are endless. At this point, if you still don't understand the purpose of Delimit, this program probably isn't going to be of use to you. Do me a favor and pass it on to a friend who might be interested. You could be doing your friend a favor as well. On the other hand, if you are faced with the same situation I've just described, read on. The rest of this document will explain how to use Delimit, and includes examples for all of the features. Just one thing before we get started. This program is being released as shareware...with a twist. Individuals using it for personal use, and nonprofit organizations using it in their nonprofit ventures, are free to use Delimit without paying the registration fee, if they choose. Anyone else using Delimit (for-profit businesses), beyond a reasonable trial period (use your own judgement here), must pay a registration fee of $24.95 to continue to use the program. Registered users will receive a disk containing the latest version of Delimit, an upgrade notice when a newer version of the program is available, and a discount off the cost of registering the newer version. Please note that individuals and nonprofit organizations electing to use Delimit without paying the registration fee are only entitled to free use of the program, and not to these additional benefits of registered users. Yes, you could easily "cheat" and use Delimit without paying the registration fee. But, my sincere hope is that those who use it will appreciate its real value (time saved, information gained, money saved, etc.) and will realize that their $24.95 is a worthwhile investment. Delimit required a great deal of personal time and effort to develop. I appreciate all of you who support my work through your registration. Thank you! To register your copy of Delimit, print the file REGISTER.TXT, fill it out and enclose payment, and mail it to the address at the bottom of the form. Using Delimit To use Delimit you need to create a configuration file for the report you want to process. Don't worry, this configuration file is quite simple to make. Below is the configuration file that was used to process the report shown earlier in the documentation. I'll explain each line in this configuration file next. ------------------------------------------------------------------------------------------------ [Settings] InputFile=sample.txt OutputFile=output.txt DiscardFile=discard.txt FilterDefault=Exclude BlankFieldFill=True IncludeOperator=And ExcludeOperator=And Trimming=True [Include] 18,11,Numeric [Exclude] [Fields] 1,14,Numeric 18,11,Numeric 32,8,Numeric 43,9,Numeric 55,10,Numeric 68,10,Numeric [Occasionals] 25,23,"For The Month Beginning",49,10,Alpha ------------------------------------------------------------------------------------------------ [Settings] section In this section you set the values of several parameters that affect the operation of Delimit. Each parameter, and its possible values, is explained below. InputFile This parameter specifies the name of the report that you want to process. You may specify a drive and path if the file is not in the current directory. This parameter is required. OutputFile This parameter specifies the name of the report where you want Delimit to send its output (the comma delimited data). You may specify a drive and path if the file will not be in the current directory. This parameter is required. DiscardFile Later in the configuration file you will be able to specify which lines Delimit should "throw out" when processing the report. In our example report we are only interested in lines with item sales figures, and all other lines should be discarded. If you include this parameter in your configuration file, the discarded lines will be written to the specified file. This feature is useful for checking that the proper lines were discarded when you are working on creating a correct configuration file. After running Discard, you can look at the contents of the discard file and make sure no good lines were discarded. This parameter is optional. FilterDefault This parameter tells Delimit whether to keep or discard lines in the report by default. The valid values for this parameter are "Include" and "Exclude". In some cases it will be easiest to specify which lines contain data, so by default Delimit should exclude lines from the report (i.e. it will discard (exclude) a line unless it meets the criteria you have specified for keeping a line -- FilterDefault=Exclude). In other cases, it will be easiest to specify which lines to exclude (FilterDefault=Include). For the example report, we are going to specify which lines to keep so FilterDefault=Exclude. BlankFieldFill This parameter determines whether Delimit will fill a blank field with the most recent nonblank value of that field. The valid values for this parameter are "True" and "False". In the sample report, the salesperson id is shown on the first line for each salesperson, but on successive lines this field is blank. In this case, BlankFieldFill=True so that salesperson id's will be carried down to successive lines in the comma delimited file, until a new salesperson id is found. If BlankFieldFill=False, the first few lines of the comma delimited file would have looked like this: "01/01/1993",8342,981239872,5,74.95,374.75,56.21 "01/01/1993",,987243873,23,14.95,343.85,51.58 "01/01/1993",,989123783,3,274.85,824.55,123.68 "01/01/1993",8573,981239872,4,74.95,299.80,44.97 "01/01/1993",,987243873,27,14.95,403.65,60.55 IncludeOperator Later in this documentation I will explain how to specify conditions that lines must meet in order to be included in processing and output to the comma delimited file. At times it might be necessary to specify more than one condition that a line must meet to be included. This parameter specifies whether those conditions should be combined with an AND or an OR. For example, you can specify that a line must meet condition x AND condition y, or you can specify that a line must meet condition x OR condition y. Valid values for this parameter are "And" and "Or". ExcludeOperator Later in this documentation I will explain how to specify conditions that lines must meet in order to be excluded from processing and output to the comma delimited file. At times it might be necessary to specify more than one condition that a line must meet to be excluded. This parameter specifies whether those conditions should be combined with an AND or an OR. For example, you can specify that a line must meet condition x AND condition y, or you can specify that a line must meet condition x OR condition y. Valid values for this parameter are "And" and "Or". Trimming This parameter determines if Delimit will trim spaces from the beginning and end of fields that are written to the comma delimited file. Valid values for this parameter are "True" and "False". [Include] section In this section, you specify the conditions that each line in the report must meet in order to be included in the comma delimited file. The format of the lines in this section is "column number, number of characters, condition". The column number and number of characters specify the characters that must meet the condition. The condition can be a string, a set of characters, or one of the words "Alpha", "Numeric", "Blank", or "NonBlank". In the example report the line "18,11,Numeric" specified that the 11 characters, starting in column 18, must be a number for the line to be included. Here are some more examples: The character in column 11 must be 'A', 'B', or 'C': 11,1,{ABC} The 3 characters starting in column 11 must be "Abc": 11,3,"Abc" The first 5 characters on the line must be blank: 1,5,Blank At least one of the first 5 characters must not be blank: 1,5,NonBlank None of the 10 characters starting in column 35 can be a number: 35,10,Alpha The 10 characters starting in column 35 must be a number: 35,10,Numeric Note: " 123 " is a number, while "123 456" is not a valid number. If you put more than one line in this section, the conditions you specify on each line will be combined with one of the logical operators AND or OR, as determined by the value of the IncludeOperator parameter. For example: The first character must be an 'A' AND the next 10 characters must be a number: [Settings] IncludeOperator=And [Include] 1,1,{A} 2,10,Numeric The first character must be an 'A' OR the next 10 characters must be a number: [Settings] IncludeOperator=Or [Include] 1,1,{A} 2,10,Numeric [Exclude] section The exclude section is identical to the include section, except that it is used to specify the lines that should be excluded rather than included. Complex conditions can be specified using a combination of FilterDefault, IncludeOperator, ExcludeOperator, [Include], and [Exclude] settings. Some examples follow. Include all lines that have a number in the first 10 characters AND have a '/' in columns 50 and 53 (a good way to search for dates of the form MM/DD/YY) but do not have the word "Deleted" beginning in column 17: [Settings] FilterDefault=Exclude IncludeOperator=And [Include] 1,10,Numeric 50,1,{/} 53,1,{/} [Exclude] 17,7,"Deleted" Exclude any lines in which the first 5 characters are blank OR contain the word "Total", unless there is a number in the 6 characters beginning in column 30: [Settings] FilterDefault=Include ExcludeOperator=Or [Exclude] 1,5,Blank 1,5,"Total" [Include] 30,6,Numeric [Fields] section In the Fields section, you specify which columns from the included lines contain the data that you want sent to the output file. Each line in the Fields section is of the form "column number, number of characters, field type". The column number and number of characters specify which characters to extract from the line in the report. The field type is either of the words "Alpha" or "Numeric". If the field is alpha, it will be enclosed in quotes in the output file. Numeric fields will not be enclosed in quotes. For example, in the sample output file line below, the first field was specified as Alpha and the second field was specified as numeric. "John Doe", 25 You may specify an unlimited number of fields. They will be sent to the output file in the order specified in the configuration file. [Occasionals] section An occasional is a combination of both an Include and a Field specification. In many reports, data is listed only occasionally at the beginning of a section or at the top of a page. In the sample report shown earlier, the reporting period is only shown at the top of each page on the line that contains the text "For The Month Beginning". To include this data at the beginning of each line in the output file, an occasional was specified in the configuration file. The format of the lines in the Occasionals section is "column number, number of characters, condition, field column, field characters, field type". The first three parameters specify the condition that occasional lines meet, and the last three parameters specify the position on the line and type of the data to be written to the output file. For example: Any lines containing the text "For The Month Beginning" starting in column 25, contain a 10 character Alpha field starting in column 49, is specified as: 25,23,"For The Month Beginning",49,10,Alpha. The fields specified in the Occasionals section will not be output as the occasional lines are encountered, but instead, at the beginning of each and every line that is included from the report. Take a look at the sample output file to see how this works. You may have more than one occasional, with each one being output at the beginning of each included line. Comments To include comments in a configuration file, use the semicolon (;). Any text on a line that comes after a semicolon will be ignored. Adding comments will help you remember what the purpose of each line was, and make modifications to the configuration file easier. [Fields] 1,11,Alpha ;Social security number <--- This is a comment ; So is this Running Delimit Once you have created a configuration file, just type DELIMIT followed by the name of the configuration file, then press the Enter key. A sample report and configuration file have been included with Delimit. The sample report is named SAMPLE.TXT and the configuration file for processing this report is named SAMPLE.CFG. To run Delimit on this report, just type DELIMIT SAMPLE.CFG and press the Enter key. The results will be written to the files OUTPUT.TXT and DISCARD.TXT. Contacting the Author I can be reached on CompuServe. My ID is 70413,1360. You may also contact me in writing at: Jeff Carey 3735 Eastmont Avenue Bloomington, IN 47403 I'd greatly appreciate any suggestions for improvement, constructive criticisms, or even compliments!