HTMLCon Version 2.0 (June, 1995) An HTM(L) to ASCII Document Converter Satore Township P.O. Box 750836 Petaluma, CA 94975-0836 WWW to http://www.crl.com/~mikekell FTP to ftp.crl.com/ftp/users/ro/mikekell/ftp This program may be distributed freely as long as no modifications are made to it or this documentation. We ask that you register this program if you find it useful. The registration fee of $7.00 (U.S., by check) should be mailed to Satore Township at the address given above. If you register this program and provide us with your e-mail address, we will provide you with the command to eliminate the registration request screen which appears when the program is initiated. E-mail to mikekell@crl.com for comments or suggestions. About the Program ----------------- HTMLCon converts HTML/HTM files to standard ASCII files, making them ready for viewing, editing or printing with standard DOS, OS/2 or Windows tools. HTMLCon operates under MSDOS or under any program capable of providing an MSDOS session and using COMMAND.COM as a command interpreter. HTMLCon can be used in a Windows environment with "drag and drop" operation. After processing the input document, output will be displayed on a viewer or editor of your choice, or printed if you choose. HTMLCon recognizes HTML symbology through HTML+ level as of this date. It will automatically detect HTML files created in either an MSDOS or UNIX environment and process them correctly. HTMLCon will attempt to process the raw HTML file such that the output is as readable as possible, eliminating unfavorable formatting to every extent practical. A variety of options are available as defined in the control file (HTMLCON.INI). The control file is necessary for the proper operation of HTMLCon. This file may be modified with any text editor and is heavily commented to allow you to set various options. Installation ------------ Copy HTMLCON.EXE and HTMLCON.INI to a new directory of your choice. Now set the environment variable "HTMLCON" to point to the directory where HTMLCON.INI resides. This will allow you to run the program from any location on your system. For example, if you put HTMLCON.EXE and HTMLCON.INI in the directory C:\UTILS, use the following command in your AUTOEXEC.BAT file: SET HTMLCON=C:\UTILS Notice that a trailing backslash should not be used with the environment variable HTMLCON. Even if HTMLCon is unable to locate the HTMLCON.INI file it will operate, however none of the important directives in the HTMLCON.INI file will be used. If HTMLCon is unable to locate the control file it will advise of the problem, wait thirty seconds, then proceed with processing the files you have selected using default values. If you are using HTMLCon in a Windows environment and experience an out-of-memory condition (usually indicated by HTMLCon failing to process a large number of input files) you should experiment with the following variable in the [NonWindowsApp] section of your SYSTEM.INI file: CommandEnvSize=1024 (recommended) This will ensure that HTMLCon is provided sufficient environment space to process large numbers of HTM/HTML files in a single session. Also, it is suggested that you set your DOS environment to at least 1024 bytes and your FILES argument in CONFIG.SYS to at least 49 in the event you experience difficulties. Since HTMLCon can process any number of HTM/HTML files in a single session, using these suggested settings as a minimum will allow the program to operate at maximum efficiency and prevent out-of-memory conditions in most installations. The program is now ready to run. Source files may be located in any directory. Output files will be created in the directory from which HTMLCon was run. If you are using the optional filter file (HTMLCON.FIL), it should be located in the same directory as HTMLCON.EXE and HTMLCON.INI. There are three additional filter files provided with HTMLCon, which are named ISO.FIL, DOS.FIL and MAC.FIL (with thanks to Claude Grenier). The three filter files will allow various conversions of HTML character sets. Your favorite FIL file should be renamed to HTMLCON.FIL for use with HTMLCon. Please see the self-documenting FIL files for more information. In most cases the default HTMLCON.FIL file (DOS.FIL) will be appropriate. Operation --------- HTMLCon can be operated in the interactive mode by running "HTMLCon" from the MSDOS session. It can also be run without operator intervention by using the following command line arguments: HTMLCon input_file[.html] output_file[.ASC], or HTMLCon input_file[.html] A wide variety of user-defined references can be stated in HTMLCON.INI control file as shown below. In addition, HTMLCon will provide a short menu of fundamental options when run in the interactive mode. Also, default file extensions can be overridden on the command line for both input and output files (as well as in the HTMLCON.INI file). HTMLCon has the ability to process multiple input files. When used in this mode HTMLCon will automatically assign the file extension '.ASC' to all output files unless the default file extension has been changed in the HTMLCON.INI file. HTMLCon will automatically detect the multiple file input mode by the presence of a '*' or '?' in the input file name. For example, suppose that HTMLCon resides in the directory "C:\HTMLCON" and that there are several HTM/HTML files in the directory "C:\HTMLWRIT" that you wish to process. First, move to the "C:\HTMLCON" directory, then issue the command "HTMLCON C:\HTMLWRIT\*.html". HTMLCon will process the files, one-by-one, asking you each time if you wish to proceed with processing the next file. When asked if you wish to proceed, you will be given the following options: Y)es (the default), N)o (no to this file only), Q)uit (quit processing all files), or A)ll (process all of the remaining files without pausing). HTMLCon also has the ability to print processed files. By placing the following line in the HTMLCON.INI file you are able to activate printing capabilities: useprinter=yes This command will tell HTMLCon to query each file processed to be sent to LPT1. You may respond Y)es or N)o to the query (default YES). If the above line does not appear in the HTMLCON.INI file then HTMLCon will not ask about printing files after they are processed. Please note that HTMLCon will only use LPT1 and provides no other processing to the output file. HTMLCon assumes you have a printer connected to LPT1 if you use this option and further assumes that the printer is working properly. Images found in the HTM file are output as [I], HREF references as [*]. Forms are properly noted and marked, as is preformatted text and other special HTML symbols. Derivatives are ignored except when the text is preformatted and unless the special HTMLCON.FIL file is used. HTMLCon can make use of a special filter file (HTMLCON.FIL in the default directory) in order to translate HTML ENTITIES of the user's choice. Use of this filter is activated by the statement "usefilter=yes" in the HTMLCON.INI file (see below). The user may define up to 300 such filters in the HTMLCON.FIL file. See the sample HTMLCON.FIL file for further details. This is an advanced feature and is not necessary for non-demanding HTMLCon use. Since the HTML language is evolving continuously, it is possible that HTMLCon may not recognize certain symbols properly. Also, since there is great variation in the creation of HTML documents, it may not be possible to ideally format all output. Problems with the output will be corrected in future versions and we ask that you let us know of any problems by sending us e-mail, including the original HTML document that is not being processed correctly. HTMLCon Control File -------------------- The control file should be named HTMLCON.INI and exist in the same directory as HTMLCon. Here is a sample, with explanations, of the control file: # HTMLCon Initialization File (current through version 2.0) # --------------------------------------------------------- # # ----- ABOUT THE HTMLCON.INI CONTROL FILE ----- # # Lines beginning with a pound sign are considered comments. # All other lines are considered instructions and must exactly follow # the format described in this sample file. Arguments are seperated # by an equal sign (=) which must not be preceeded or succeeded by # a space or tab. # # # ----- DEFINING THE OUTPUT LINE LENGTH ----- # # Define the default point at which HTMLCon should attempt to break a # line for the output file. The break is not guaranteed to occur at # this point, but as close to it as possible to retain the syntax of # the input line. Default=72. # #linebreak=75 # # # ----- COLLECTING STATISTICS ----- # # Statistics can be compiled and written to the output file. Default=No. # Use of this function does not increase the processing time and it does # provide some interesting information in the output file. # statistics=yes # # # ----- VIEWING OR PROCESSING THE OUTPUT FILE AUTOMATICALLY ----- # # You may launch another program after HTMLCon finishes its work. This # may be an ASCII file viewer, editor, or whatever. The launched program # must be able to take the output file name as an argument. In order to # accomplish this you must provide the FULL PATH to your program. This # is a handy function to allow you to automatically and immediately see # the results of the HTMLCon conversion process. # #launchprog=c:\utils\list.com # # # ----- FINDING AND REPLACING THINGS ----- # # Find and replace: you may specify up to 50 strings to be located in # the HTML file and replaced in the ASCII output file. These will be a # direct replacement using the two commands "find=" and "replace=". Each # "find" element will be replaced by a "replace" element, therefore you # cannot have a "find=" statement without a following "replace=" statement. # To specify leading or ending spaces in a statement, surround the statement # with quotations ("). The strings cannot exceed 40 characters each. # find=" -- " replace=-- # # Here is an example replacing all HTMLCon reference symbols [*] with just *. # #find=[*] #replace=* # # Or just ignore all references altogether... # #find=[*] #replace= # # Some nice find/replace items to make the output look a bit better. # # [add whatever you would like here] # # # ----- KEEPING THE AUTHOR'S ORIGINAL FORMATTING ----- # # You may elect to keep the formatting characteristics of the original # HTML file intact. This will preserve white spaces, line breaks, etc. as # originally constructed by the author of the HTML page. # #keepformatting=yes # # # ----- IGNORING HTMLCON'S MARKERS IN THE OUTPUT FILE ----- # # You may choose to have HTMLCon not replace certain HTML constructs # with its own markers (for example, HTMLCon replaces URL references # with the symbol [*]). To have HTMLCon simply ignore its own symbols and # not reference certain items in the original HTML file, uncomment the # next line: # #ignoresymbols=yes # # # ----- PRESERVING HREF MARKERS IN THE OUTPUT FILE ----- # # You may instruct HTMLCon to preserve all constructs when # converting the HTML file. These references will be preserved intact, # without modification. To use this feature, uncomment the next line: # #keephref=yes # # # ----- ELIMINATING ADVERTISEMENTS AND DELAYS ----- # # Eliminate the advertisements and delays # [available to registered users only] # # # ----- PRINTING THE OUTPUT FILE ON LPT1 ----- # # If you would like the option to send the processed file to LPT1 # then uncomment the next line: # #useprinter=yes # # Note that you may only send the processed file to a line printer # attached to LPT1 and that HTMLCon assumes the printer is connected # and operating properly. # # # ----- SPEED PROCESSING MULTIPLE FILES ----- # # Uncomment the following line to tell HTMLCon to NEVER pause for any # prompt, including the call to your file viewer or other # post-processor. # #nopause=yes # # # ----- IGNORING CERTAIN FILE TYPES ----- # # The following directive lists file extensions which should always be # ignored by HTMLCon. If an input file name contains one of these # extensions than it will never be processed. Note that the file # extension must always include the "." in this directive: # ignore=.ZIP.EXE.COM.LZH.GIF.LPG.ARC.ASC.SYS.INI.TXT.DOC # # # ----- USING USER-DEFINED FILTERS ----- # # Uncomment the next directive to have HTMLCon apply a set of filter # replacements contained in the file HTMLCON.FIL in HTMLCon's default # directory. This filter file will find and replace HTML ENTITIES # in your output file. # usefilter=yes # # # ----- CHANGING THE DEFAULT OUTPUT FILE NAME EXTENSION ----- # # HTMLCon normally uses the default file extension ".ASC" when multiple # files are processed or the file extension is not specified. You may # specify your own default file extension using the following command. # This file extension MUST be preceeded by a "." and contain no more than # three characters. # #extension=.TXT # # ----- ADDITIONAL OUTPUT FORMAT OPTIONS ----- # # In order to compress extra spaces in the output, uncomment this line: # (Note: using compress=yes is recommended for nicer output.) # compress=yes # # # ----- USER-DEFINED LINE BREAK POINTS ----- # # HTMLCon will always search for certain characters by which to break a # line for output purposes. You may also elect to add other characters # for which HTMLCon will search to logically break a line. You may # specify up to 50 such characters in a single command using the option # below. Be careful doing this, however, so that you do not end up with # illogically-truncated lines in your output. If HTMLCon does not find # one of the default characters mentioned above, it will seek out one of # the characters you itemize in the command below. The FIRST character it # finds will cause HTMLCon to break the line if it is within the specified # margin parameters established using the "linebreak=" command above: # #breakchars=:;=\|@ # # # End of file