HTMLCon Version 2.0 (June, 1995)
An HTM(L) to ASCII Document Converter
Satore Township
P.O. Box 750836
Petaluma, CA 94975-0836
WWW to http://www.crl.com/~mikekell
FTP to ftp.crl.com/ftp/users/ro/mikekell/ftp
This program may be distributed freely as long as no
modifications are made to it or this documentation. We
ask that you register this program if you find it useful.
The registration fee of $7.00 (U.S., by check) should be
mailed to Satore Township at the address given above. If
you register this program and provide us with your e-mail
address, we will provide you with the command to eliminate
the registration request screen which appears when the
program is initiated.
E-mail to mikekell@crl.com for comments or suggestions.
About the Program
-----------------
HTMLCon converts HTML/HTM files to standard ASCII files, making them ready
for viewing, editing or printing with standard DOS, OS/2 or Windows tools.
HTMLCon operates under MSDOS or under any program capable of providing an
MSDOS session and using COMMAND.COM as a command interpreter. HTMLCon can
be used in a Windows environment with "drag and drop" operation. After
processing the input document, output will be displayed on a viewer or
editor of your choice, or printed if you choose.
HTMLCon recognizes HTML symbology through HTML+ level as of this date.
It will automatically detect HTML files created in either an MSDOS or
UNIX environment and process them correctly. HTMLCon will attempt to
process the raw HTML file such that the output is as readable as
possible, eliminating unfavorable formatting to every extent practical.
A variety of options are available as defined in the control file
(HTMLCON.INI). The control file is necessary for the proper operation
of HTMLCon. This file may be modified with any text editor and is
heavily commented to allow you to set various options.
Installation
------------
Copy HTMLCON.EXE and HTMLCON.INI to a new directory of your choice.
Now set the environment variable "HTMLCON" to point to the directory
where HTMLCON.INI resides. This will allow you to run the program
from any location on your system. For example, if you put HTMLCON.EXE
and HTMLCON.INI in the directory C:\UTILS, use the following command
in your AUTOEXEC.BAT file:
SET HTMLCON=C:\UTILS
Notice that a trailing backslash should not be used with the environment
variable HTMLCON. Even if HTMLCon is unable to locate the HTMLCON.INI
file it will operate, however none of the important directives in the
HTMLCON.INI file will be used. If HTMLCon is unable to locate the control
file it will advise of the problem, wait thirty seconds, then proceed
with processing the files you have selected using default values.
If you are using HTMLCon in a Windows environment and experience an
out-of-memory condition (usually indicated by HTMLCon failing to process
a large number of input files) you should experiment with the following
variable in the [NonWindowsApp] section of your SYSTEM.INI file:
CommandEnvSize=1024 (recommended)
This will ensure that HTMLCon is provided sufficient environment space
to process large numbers of HTM/HTML files in a single session. Also,
it is suggested that you set your DOS environment to at least 1024 bytes
and your FILES argument in CONFIG.SYS to at least 49 in the event you
experience difficulties. Since HTMLCon can process any number of HTM/HTML
files in a single session, using these suggested settings as a minimum will
allow the program to operate at maximum efficiency and prevent out-of-memory
conditions in most installations.
The program is now ready to run. Source files may be located in any
directory. Output files will be created in the directory from which
HTMLCon was run. If you are using the optional filter file (HTMLCON.FIL),
it should be located in the same directory as HTMLCON.EXE and HTMLCON.INI.
There are three additional filter files provided with HTMLCon, which are
named ISO.FIL, DOS.FIL and MAC.FIL (with thanks to Claude Grenier). The
three filter files will allow various conversions of HTML character
sets. Your favorite FIL file should be renamed to HTMLCON.FIL for use
with HTMLCon. Please see the self-documenting FIL files for more
information. In most cases the default HTMLCON.FIL file (DOS.FIL) will
be appropriate.
Operation
---------
HTMLCon can be operated in the interactive mode by running "HTMLCon"
from the MSDOS session. It can also be run without operator
intervention by using the following command line arguments:
HTMLCon input_file[.html] output_file[.ASC], or
HTMLCon input_file[.html]
A wide variety of user-defined references can be stated in HTMLCON.INI
control file as shown below. In addition, HTMLCon will provide a short
menu of fundamental options when run in the interactive mode. Also,
default file extensions can be overridden on the command line for both
input and output files (as well as in the HTMLCON.INI file).
HTMLCon has the ability to process multiple input files. When used
in this mode HTMLCon will automatically assign the file extension '.ASC'
to all output files unless the default file extension has been changed
in the HTMLCON.INI file. HTMLCon will automatically detect the multiple file
input mode by the presence of a '*' or '?' in the input file name.
For example, suppose that HTMLCon resides in the directory "C:\HTMLCON"
and that there are several HTM/HTML files in the directory "C:\HTMLWRIT"
that you wish to process. First, move to the "C:\HTMLCON" directory,
then issue the command "HTMLCON C:\HTMLWRIT\*.html". HTMLCon will
process the files, one-by-one, asking you each time if you wish to
proceed with processing the next file. When asked if you wish to
proceed, you will be given the following options: Y)es (the default), N)o
(no to this file only), Q)uit (quit processing all files), or A)ll
(process all of the remaining files without pausing).
HTMLCon also has the ability to print processed files. By placing the
following line in the HTMLCON.INI file you are able to activate printing
capabilities:
useprinter=yes
This command will tell HTMLCon to query each file processed to be sent
to LPT1. You may respond Y)es or N)o to the query (default YES). If
the above line does not appear in the HTMLCON.INI file then HTMLCon will
not ask about printing files after they are processed. Please note that
HTMLCon will only use LPT1 and provides no other processing to the
output file. HTMLCon assumes you have a printer connected to LPT1 if you
use this option and further assumes that the printer is working
properly.
Images found in the HTM file are output as [I], HREF references as
[*]. Forms are properly noted and marked, as is preformatted text and
other special HTML symbols. Derivatives are ignored except when the
text is preformatted and unless the special HTMLCON.FIL file is used.
HTMLCon can make use of a special filter file (HTMLCON.FIL in the
default directory) in order to translate HTML ENTITIES of the user's
choice. Use of this filter is activated by the statement
"usefilter=yes" in the HTMLCON.INI file (see below). The user may
define up to 300 such filters in the HTMLCON.FIL file. See the
sample HTMLCON.FIL file for further details. This is an advanced
feature and is not necessary for non-demanding HTMLCon use.
Since the HTML language is evolving continuously, it is possible that
HTMLCon may not recognize certain symbols properly. Also, since there
is great variation in the creation of HTML documents, it may not be
possible to ideally format all output. Problems with the output will be
corrected in future versions and we ask that you let us know of any
problems by sending us e-mail, including the original HTML document that
is not being processed correctly.
HTMLCon Control File
--------------------
The control file should be named HTMLCON.INI and exist in the same
directory as HTMLCon. Here is a sample, with explanations, of the
control file:
# HTMLCon Initialization File (current through version 2.0)
# ---------------------------------------------------------
#
# ----- ABOUT THE HTMLCON.INI CONTROL FILE -----
#
# Lines beginning with a pound sign are considered comments.
# All other lines are considered instructions and must exactly follow
# the format described in this sample file. Arguments are seperated
# by an equal sign (=) which must not be preceeded or succeeded by
# a space or tab.
#
#
# ----- DEFINING THE OUTPUT LINE LENGTH -----
#
# Define the default point at which HTMLCon should attempt to break a
# line for the output file. The break is not guaranteed to occur at
# this point, but as close to it as possible to retain the syntax of
# the input line. Default=72.
#
#linebreak=75
#
#
# ----- COLLECTING STATISTICS -----
#
# Statistics can be compiled and written to the output file. Default=No.
# Use of this function does not increase the processing time and it does
# provide some interesting information in the output file.
#
statistics=yes
#
#
# ----- VIEWING OR PROCESSING THE OUTPUT FILE AUTOMATICALLY -----
#
# You may launch another program after HTMLCon finishes its work. This
# may be an ASCII file viewer, editor, or whatever. The launched program
# must be able to take the output file name as an argument. In order to
# accomplish this you must provide the FULL PATH to your program. This
# is a handy function to allow you to automatically and immediately see
# the results of the HTMLCon conversion process.
#
#launchprog=c:\utils\list.com
#
#
# ----- FINDING AND REPLACING THINGS -----
#
# Find and replace: you may specify up to 50 strings to be located in
# the HTML file and replaced in the ASCII output file. These will be a
# direct replacement using the two commands "find=" and "replace=". Each
# "find" element will be replaced by a "replace" element, therefore you
# cannot have a "find=" statement without a following "replace=" statement.
# To specify leading or ending spaces in a statement, surround the statement
# with quotations ("). The strings cannot exceed 40 characters each.
#
find=" -- "
replace=--
#
# Here is an example replacing all HTMLCon reference symbols [*] with just *.
#
#find=[*]
#replace=*
#
# Or just ignore all references altogether...
#
#find=[*]
#replace=
#
# Some nice find/replace items to make the output look a bit better.
#
# [add whatever you would like here]
#
#
# ----- KEEPING THE AUTHOR'S ORIGINAL FORMATTING -----
#
# You may elect to keep the formatting characteristics of the original
# HTML file intact. This will preserve white spaces, line breaks, etc. as
# originally constructed by the author of the HTML page.
#
#keepformatting=yes
#
#
# ----- IGNORING HTMLCON'S MARKERS IN THE OUTPUT FILE -----
#
# You may choose to have HTMLCon not replace certain HTML constructs
# with its own markers (for example, HTMLCon replaces URL references
# with the symbol [*]). To have HTMLCon simply ignore its own symbols and
# not reference certain items in the original HTML file, uncomment the
# next line:
#
#ignoresymbols=yes
#
#
# ----- PRESERVING HREF MARKERS IN THE OUTPUT FILE -----
#
# You may instruct HTMLCon to preserve all constructs when
# converting the HTML file. These references will be preserved intact,
# without modification. To use this feature, uncomment the next line:
#
#keephref=yes
#
#
# ----- ELIMINATING ADVERTISEMENTS AND DELAYS -----
#
# Eliminate the advertisements and delays
# [available to registered users only]
#
#
# ----- PRINTING THE OUTPUT FILE ON LPT1 -----
#
# If you would like the option to send the processed file to LPT1
# then uncomment the next line:
#
#useprinter=yes
#
# Note that you may only send the processed file to a line printer
# attached to LPT1 and that HTMLCon assumes the printer is connected
# and operating properly.
#
#
# ----- SPEED PROCESSING MULTIPLE FILES -----
#
# Uncomment the following line to tell HTMLCon to NEVER pause for any
# prompt, including the call to your file viewer or other
# post-processor.
#
#nopause=yes
#
#
# ----- IGNORING CERTAIN FILE TYPES -----
#
# The following directive lists file extensions which should always be
# ignored by HTMLCon. If an input file name contains one of these
# extensions than it will never be processed. Note that the file
# extension must always include the "." in this directive:
#
ignore=.ZIP.EXE.COM.LZH.GIF.LPG.ARC.ASC.SYS.INI.TXT.DOC
#
#
# ----- USING USER-DEFINED FILTERS -----
#
# Uncomment the next directive to have HTMLCon apply a set of filter
# replacements contained in the file HTMLCON.FIL in HTMLCon's default
# directory. This filter file will find and replace HTML ENTITIES
# in your output file.
#
usefilter=yes
#
#
# ----- CHANGING THE DEFAULT OUTPUT FILE NAME EXTENSION -----
#
# HTMLCon normally uses the default file extension ".ASC" when multiple
# files are processed or the file extension is not specified. You may
# specify your own default file extension using the following command.
# This file extension MUST be preceeded by a "." and contain no more than
# three characters.
#
#extension=.TXT
#
# ----- ADDITIONAL OUTPUT FORMAT OPTIONS -----
#
# In order to compress extra spaces in the output, uncomment this line:
# (Note: using compress=yes is recommended for nicer output.)
#
compress=yes
#
#
# ----- USER-DEFINED LINE BREAK POINTS -----
#
# HTMLCon will always search for certain characters by which to break a
# line for output purposes. You may also elect to add other characters
# for which HTMLCon will search to logically break a line. You may
# specify up to 50 such characters in a single command using the option
# below. Be careful doing this, however, so that you do not end up with
# illogically-truncated lines in your output. If HTMLCon does not find
# one of the default characters mentioned above, it will seek out one of
# the characters you itemize in the command below. The FIRST character it
# finds will cause HTMLCon to break the line if it is within the specified
# margin parameters established using the "linebreak=" command above:
#
#breakchars=:;=\|@
#
#
# End of file