BIGSORT V4.2 : A Fast in-memory sort for files of any size. ---------------------------------------------------- User Supported Program Continued use requires a donation of $20 (C)1988-93 Turgut Kalfaoglu , BIGSORT uses the fastest known sorting algorithm to sort files that can be as large as your swapping area (not RAM) allows. A wide range of options along with multiple key fields enable you to pinpoint the desired sorting method. BIGSORT is especially well suited for batch files, and to be called from other programs. It returns specific error codes, and never prompts for verification or additional information. Always using the defined (primary) collating sequence for your country, BIGSORT will be able to place your national characters in the correct order. Unless /Verbose option is specified, BIGSORT never writes its messages to the standard output, to prevent its messages from getting written to an output file. Its messages are always directed to "standard error." Under OS/2, it is possible to redirect stderr to a file, if desired. This program is shareware: A registration allows you stay up-to-date on enhancements to the product, and enables you to purchase the source code. Usage: BIGSORT [options] < inputfile > outputfile if you omit the '< inputfile' part, BIGSORT will wait for an answer from the keyboard. If that is what you wish, enter the data, separating each one by a RETURN character, then enter CTRL-Z to finish the entry. if you omit '> outputfile' part, BIGSORT will send its output to the screen. For some online help, type BIGSORT HELP The normal usage of BIGSORT is either thru OS/2 "pipes", or thru redirection. Pipes enable a program's output to be sent as input to a second program. This is specified by using the "|" symbol between the two programs. Redirection is similar, it allows the output of a command to be sent to a file, instead of getting displayed on the screen. The ">" symbol indicates that the output should be sent to the file, not to the screen. Note that the > symbol causes the previous contents of the file to be lost. The >> symbol can be used to append to the previous contents. Options ------- Use options to change the default behavior of BIGSORT, which is: * Start sorting on the first position of each line, * Do a case-sensitive alphanumeric sort, * Reserve room for 100,000 lines of input file. (Lines, not bytes). If you wish to use multiple options, you need to separate them by spaces. The options available with this version are: /+nnn where nnn's are a number, will cause BIGSORT to start sorting items from that column. If omitted, BIGSORT sorts the file starting from the first character. /+nnn-mmm where nnn and mmm are the column numbers, causes the program to focus only on the area between those two columns. This option can be repeated as many times as necessary to specify secondary sort keys. See the chapter on multiple keys. /R Reverses the sort order. The sorting order will be descending order for that field, if this option is specified. /Ds Specifies the symbol to use as a delimiter for the date symbol. 's' can be either a dash "-", a slash "/", a period "." or nothing. If "/D " is specified, BIGSORT assumes that the digits are attached to each other, like 19921220, to specify a date of Dec 20th, 1992. Default: /D- /I Ignore case. Without this option, A comes before a, and Z comes before a. Use this option to prevent this. /MMDDYY The field is a date field, in the format of MM-DD-YY. Unless the /D option is specified, BIGSORT assumes that dashes separate the digits. /DDMMYY Similar - the data field is in DD-MM-YY format. /YYMMDD Similar - the data field is in YY-MM-DD format. /Snnnn Specifies the index size for the file. One index entry per each line in your file is needed to load the file. Normally BIGSORT reserves a room for 100 thousand lines. If the number of lines (or records) in your file is more than 100 thousand, you need to specify the /S option. For example, if your file is 200 thousand lines, and you expect it to grow, you may tell bigsort to reserve room for 500 thousand: /S500000 /N Indicates that the field specified is a numeric field, thus a numeric comparison should take place. This prevents leading blanks and other characters from interfering with the sorting order of numbers. If /N is specified, BIGSORT compares the field contents after converting them to floating-point numbers. This ensures accurate sorting of numbers with decimal places. Specifying Ranges ----------------- Ranges limit the area where BIGSORT should focus on. For example, if you wanted to sort the files displayed with OS/2 DIR command, based on the file sizes, you could tell BIGSORT to sort the data based on the information contained between column 17 and 27. The option for that would look like: "/+17-27". Options are parsed from left to right, changing the internal defaults as it goes along. When BIGSORT comes upon a range field, it records the current setting of such options as "/R", "/I", "/N" and date-related options, for that range. It then resets these options to the defaults, to enable you to construct a completely new set of rules of your second sort field specification, if any. Let's see this with an example. Let's try to sort DIR's output on creation date, and then on the name. Our idea is to display the files in the chronological order, but files created on the same day, should be sorted by name within themselves. Here is a ruler line, followed by a sample output of the DIR command: 0 1 2 3 4 5 12345678901234567890123456789012345678901234567890 9-11-92 8:01p 922 . 9-11-92 8:01p 690 .. 12-22-92 7:14p 5638 0 BIGSORT.BAK 12-22-92 7:42p 5267 0 BIGSORT.C 12-22-92 7:14p 869 0 BIGSORT.H 12-22-92 8:44p 5167 0 BIGSORT.TXT 12-22-92 7:12p 3004 0 COMPARES.C (...) The command to give to BIGSORT to sort the above list would be: DIR | BIGSORT /MMDDYY /d- /+1-9 /i /+41-80 > myfile.output Let's analyze this command. When OS/2 "sees" this line, it first erases and opens the "myfile.output" which will store the results of the operation. It then runs the "DIR" command, passing its output to BIGSORT with all the parameters. When BIGSORT starts, it defaults to using its case-sensitive alphanumeric sort. The first option changes this to the sort on the date field. The second option specifies that the month, days and year digits are separated by dashes, which is the default, by the way. When we specify the range, "/+1-9" our /MMDDYY option is recorded as the desired sort method for the first range, and BIGSORT resets the sort method to case-sensitive alphanumeric sort. Now it reads the "/I" option and switches its current sort method to case-insensitive, alphanumeric sort. When it reads the "/+41-80" range specifier, it records that our second field selection should be sorted using an alphanumeric, but case-insensitive sort. When BIGSORT is done, it sends the result to "standard output" which has been redirected to a file with the "> myfile.output" part of the command. Each time a range is specified, the specified options are recorded for that range, and the options are reset. Thus, you may have to specify the same option several times in the command line, in some cases. For example, to sort on the Lastname, then on the firstname, both using case-insensitive sort, you need to put something like: TYPE myfile | BIGSORT /i /+10-25 /i /+27-40 Multiple Ranges --------------- BIGSORT accepts up to twenty ranges. This means that you can specify up to twenty different "zones" in your data for BIGSORT to sort on. Multiple key fields are useful if you wish for example that your database output be sorted on the date field, and within that, on the last name of the person. You can tell BIGSORT to sort the first field definition corresponds to a date, and the second one to a name. This way, BIGSORT will continue sorting records on the name, if the dates are identical. BIGSORT and multiple files: --------------------------- If you wish to sort and merge several files into one, you can do it with one command under OS/2. Just do a: TYPE *.* | BIGSORT > result.txt Note that the TYPE command also sends the filename of the file it is processing, to "standard output". You may have to manually remove such records. BIGSORT Swap Area: ------------------ BIGSORT creates no swap area of its own, but uses OS/2 to allocate necessary memory to load the entire file, along with an index pointer for each line. You can "guesstimate" that an input file of 8MB, will occupy a little over 8MB of RAM when BIGSORT is working on that file. OS/2 will first allocate all available RAM, then provide the rest from its swap area. Since this area grows and shrinks automatically, there is nothing wrong with having a temporarily large swap file - just make sure to have enough space on the disk where the swap file resides. Shareware --------- BIGSORT represents countless hours of work. Please contribute to the shareware discipline by sending $20 of registration fee to: Turgut Kalfaoglu 1378 Sok. 8/10 Izmir 35210 Turkey Source Code ----------- BIGSORT is compiled under IBM C/SET 2, at CSD level 28. Clear and well-documented source code with documentation is available ONLY to the registered users. Send $10 (to cover shipping charges) and a blank disk (either 3.5 or 5.25) to the above address to receive the source code. The author encourages you to register, but also to ask for additional features, or comments. Please don't think that you need to register this software for additional features or to report problems or suggestions. However, regular use requires a donation of $20 to the above address. Update History -------------- Version 4.2: ------------ Fixes problem with sorting based on date specifications. Adds more timing information into the "/V" (verbose) option. Version 4.1: ------------ Implements the "/N" (numeric field) option. Version 4: ---------- Implements multiple key fields, Implements sort ranges, Implements the /D option. Implements country-specific information Version 4 is almost a complete re-write with division of source code into five segments. New Version: V3 --------------- Implements unlimited input filesize. New Version: (V2.1 to V2.2) --------------------------- Added features: It can now handle dates as well! Now /R and /I can be used at the same time. Improved performance, but code size still about the same (you should see the tricks that were done to keep it that way :) Bugs Removed: None were found in V2.1