6 Feb 1995 - Preliminary Information

SpHyDir Status and Limitations

SpHyDir represents a new approach to an actively changing problem. Since it uses quite different technology than other systems, it cannot benefit from libraries of common code from NCSA or CERN. The entire HTML parser was coded in two days using VX-Rexx. It is fairly thorough, but will obviously not reflect all the obsolete, proposed, and deviant versions of the HTML language. Fortunately, VX-Rexx is extremely powerful and rather easy to code. The main problem is to decide which things should be done. Since it is not possible to just slap a "DMD" file into the thing and get additional syntax, SpHyDir will always be a bit picky about its syntax.

There is a lot of bad HTML on the net. Some random experiments dragging pages out of Web Explorer and dropping them into SpHyDir have shown up a number of dead-flat errors in very public pages (stuff like </UL> tags without a starting <UL>). A browser could ignore such markups, but since SpHyDir wants to discover the structure of a document, it backs off when there are structural errors. It pops up a message with the error, and then a text window in which the cursor is positioned on the tag causing the error. The user can manually correct the problem or can abort the file.

Supported and Unsupported HTML

SpHyDir explicitly supports the HTML, HEAD, and BODY document structure tags. The only fully supported tag inside the head area is TITLE. SpHyDir recognizes and ignores other tags that are documented as part of the Header (ISINDEX, NEXTID, LINK, BASE) and an experimental tag that IBM likes to use (META). Generally, it is the intent of SpHyDir in a future release to generate all the Header tags properly based on document content and relationships with other documents.

SpHyDir supports H1...H6 when they are really used as the heading of a section and not just to get big characters.

The <P> (Paragraph) tag is supported. The </P> tag is recognized when it is present. SpHyDir rather religiously generates both P and /P when it generates output.

UL, OL, and DL lists are supported along with LI, DT, and DD elements. Nested lists are fine. It is not entirely clear just what other structural elements are reasonable inside lists. A lot of NCSA documents seem to mix in ordinary paragraphs along with the expected LI tags. Currently, SpHyDir lets you put lots of stuff in a list, but it will absolutely refuse to generate a document if it finds a Section nested inside any List alternative. If there is no formal resolution to this question, a later version will define and enforce a more explicit view of propriety for list elements.

The Horizontal Rule <HR> tag is supported. There is no icon or tool for it. This is not because it is hard to come up with an icon for a horizontal line. About ten seconds with the icon editor will produce it. To add an HR tag to a document, select the object you want it to proceed in the workarea and press Alt-H.

The PRE tag is recognized. The WIDTH attribute is not scanned and will not be regenerated. Preformatted text is not scanned for tags, so anchors and links will not be detected.

A Target is constructed whenever an <A NAME=...> tag is encountered. Although HTML allows anchors to be associated with any word or phrase, SpHyDir tries to move the anchor in front of the current paragraph, point, or image.

A Hypertext link is created from any <A HREF=...> tag encountered. SpHyDir requires that the filename or URL not contain any blanks. Other attributes (REL, REV, TITLE) are ignored.

The IMG tag may have SRC, ALIGN, and ALT attributes. Currently, only the LEFT Netscape extended alignment is supported, and that is just a test.

ADDRESS is not explicitly supported. When encountered, it is currently embedded in the text of a paragraph as if it were character emphasis. When SpHyDir generates HTML, it includes boilerplate from the HEADER and TRAILER files, and that will generally be the source of the author's name, E-mail, etc. The ADDRESS tag would typically be used in these files and not in the body of the text.

The &amp, &lt, and &gt entities are converted to ordinary characters on input and are regenerated on output. Currently no other entities are supported. It is the direction of SpHyDir to find a way to support direct editing of ISO 8859-1 characters rather than requiring them to be edited. SpHyDir uses some PC-only graphic characters as delimiters for hypertext links and character emphasis.

Forms related tags are not currently supported. Obviously this is an area where SpHyDir could be particularly useful. It's just that the other core items have a higher priority.

Known Bugs and Restrictions

Currently, Target objects are generated as empty Anchor tags (that is, there is no text between the A tag and the /A). Some browsers do not support jumping to empty anchors. It will be necessary to pin the Target name to some part of the following object. Experience shows that putting anything, even a single period inside the tag will make the browsers work correctly. The SpHyDir technique of generating temporary anchors for every header (so the TOC can point to them) causes some head scratching, since if the user puts an explicit anchor on the block there are now two labels that have to be assigned to the same header.

SpHyDir does not extract all the information possible when tags are nested inside a header. In particular, it can lose Target names (A NAME tags), and it will miss IMG references inside the H1...H6 tags. Although SpHyDir may continue to put restrictions on the contents of the header tag, it should be able to recognize elements that it will not support as part of the section title and move them outside the header.

The current design may require you to perform several operations to completely initialize a new object. After dropping a SECTION from the toolbar, you should type in a Header. The Section itself is a SpHyDir concept, HTML only recognizes the Header text. If you type nothing, then SpHyDir might generate "<H2> </H2>" which makes little sense and might be invalid. But currently SpHyDir doesn't force you to actually type a header, and it may not go back later on and "sanity check" the document to detect errors before generating HTML. So be sure to finish what you start. This includes remembering to drag a GIF file over to associate something with every Image object you create.

VX-Rexx 2.1B has a bug when moving a tree of records in a container. Suppose, for example, you decide to move one section in front of another. You can click on the sections to collapse the tree so that just the two icons are showing. You can then drag the second icon in front of the first. However, when you re-expand the tree, you will see that elements two or three levels down in the tree have been incorrectly reorganized. For now, the safe way to move large sections of the document is to mark them with Alt-L and move them through the SpHyDir special "Clipboard" window.

For some reason, you cannot drag the current contents of the Web Explorer window over and drop it directly on the SpHyDir workarea. There is some curious interaction between the way that WE declares its Drag-Drop and the Container object. As a circumvention, the entry area for titles and headers at the top right of the workarea has been declared turned into a drop area. Drop the Web Explorer page on it and the current Web document will be SpHyDir-ized just as if it had been successfully dropped on the container.

When you install SpHyDir, you unzip the program and some icon and CMD files in a directory. SpHyDir currently requires that this directory be the "current directory" when it runs. Although icons can be bound to the program as Resources, a program running in VX-Rexx Debug mode cannot use its resources, and SpHyDir is impossible to run without the icons. So currently it finds the icons as files, and that means that the directory that contains the SpHyDir icons has to be the current directory when the program is running. HTML files always come from the library structure defined by the HTMLLIB environment variable.

There doesn't appear to be a way in VX-Rexx to scroll the object in a container so that a particular object is visible. The currently "selected" record may be off the screen, and it may be necessary to scroll manually to find it.

SpHyDir has a lot to put on the screen. It is designed to work best on a 1024x768 resolution monitor, but will run well on 800x600. Resolution lower than SVGA is not recommended

SpHyDir saves the position and size of its subwindows. Very simple windows can be resized automatically. The Workarea and Link Manager windows are more complicated and cannot currently be resized. SpHyDir currently doesn't save font or color changes for windows, so it should not be reconfigured by dropping System Setup Palette objects.

Two types of object go in front of things: the Horizontal Rule (HR) and a Target. However, when you put either in front of a section, the tree structure shows the objects as if they were part of the previous section and not an introducer to the new section. That wan't intended, but it would be almost impossible to change. Treat it as a permanent feature.

Character Emphasis and Entities

It is fairly simple for SpHyDir to manage any feature that is a structural component of the document. Headers, paragraphs, images, examples, and <HR> can be rendered as objects. Links and subdocuments are handled through special logic.

It makes no sense to create "objects" to handle features that apply to an individual letter, word, or phrase. This leaves SpHyDir with a problem handing B, I, U, TT, CITE, CODE, EM, KYBD, LISTING, SAMP, STRONG, XMP, BLOCKQUOTE, and VAR. First, there are too many of these tags. There is some controversy about which are current and which are obsolete. In most cases, the browser can only render text in bold and italics anyway. Even underscore is questionable, since browsers often underscore hotwords.

For the moment, SpHyDir generates character emphasis tags back into the text. For example, <I>italics</I> will effectively show up in this form in the Workarea and Text edit window. However, upward and downward pointing triangle characters (0x1E and 0x1F) will replace the "<" and ">" characters. This substitution is necessary so that the less/greater than signs can be edited in as normal text (though they will be translated to &gt and &lt on output as HTML expects.

SpHyDir doesn't display text in bold or italics because there is no reasonable support available. The Text Edit window (a Multiline Entry Field) is constrained by OS/2 to use a single font. E.EXE doesn't support multiple fonts. EPM does allow font switches, but it won't logically wrap long lines and therefore provides no good way to handle the <BR> tag. If anyone has any good ideas, please mail them to Howard.Gilbert@yale.edu.

SpHyDir doesn't currently support &xxx; escape sequences for other characters. The final SpHyDir objective would be to locate a suitable native OS/2 environment that can handle the display of bold, italics, and hotwords, allow the editing of line breaks, and ideally display characters in the ISO 8859-1 Latin-1 alphabet. Perhaps this can be solved at the same time that the italics/bold problem is addressed.

This document generated by SpHyDir another fine product of PC Lube and Tune.