A Beginner's Guide to HTML This is a primer for producing documents in HTML, the markup language used by the World Wide Web. Acronym Expansion What This Primer Doesn't Cover Creating HTML Documents The Minimal HTML Document Basic Markup Tags Titles Headings Paragraphs Linking to Other Documents Relative Links Versus Absolute Pathnames Uniform Resource Locator Anchors to Specific Sections in Other Documents Anchors to Specific Sections Within the Current Document Additional Markup Tags Lists Unnumbered Lists Numbered Lists Definition Lists Nested Lists Preformatted Text Extended Quotes Addresses Character Formatting Physical Versus Logical: Use Logical Tags When Possible Logical Styles Physical Styles Using Character Tags Special Characters Escape Sequences Forced Line Breaks Horizontal Rules In-line Images Alternate Text for Viewers That Can't Display Images External Images, Sounds, and Animations Troubleshooting Avoid Overlapping Tags Embed Anchors and Character Tags, But Not Anything Else Check Your Links A Longer Example For More Information Fill-out Forms Style Guides Other Introductory Documents Additional References Acronym Expansion WWW World Wide Web (or Web, for short). SGML Standard Generalized Markup Language -- this is a standard for describing markup languages. DTD Document Type Definition -- this is a specific markup language, written using SGML. HTML HyperText Markup Language -- HTML is a SGML DTD. In practical terms, HTML is a collection of styles (indicated by markup tags) that define the various components of a World Wide Web document. What This Primer Doesn't Cover This primer assumes that you have: at least a passing knowledge of how to use NCSA Mosaic or some other Web browser a general understanding of how Web servers and client browsers work access to a Web server for which you would like to produce HTML documents, or that you wish to produce HTML documents for personal use Creating HTML Documents HTML documents are in plain (also known as ASCII) text format and can be created using any text editor (e.g., Emacs or vi on UNIX machines). A couple of Web browsers (tkWWW for X Window System machines and CERN's Web browser for NeXT computers) include rudimentary HTML editors in a WYSIWYG environment. There are also some WYSIWIG editors available now (e.g. HotMetal for Sun Sparcstations, HTML Edit for Macintoshes). You may wish to try one of them first before delving into the details of HTML. You can preview a document in progress with NCSA Mosaic (and some other Web browsers). Open it with the Open Local command under the File menu. After you edit the source HTML file, save the changes. Return to NCSA Mosaic and Reload the document. The changes are reflected in the on-screen display. The Minimal HTML Document Here is a bare-bones example of HTML: The simplest HTML example

This is a level-one heading

Welcome to the world of HTML. This is one paragraph.

And this is a second.

Click here to see the formatted version of the example. HTML uses markup tags to tell the Web browser how to display the text. The above example uses: the tag (and corresponding tag), which specifies the title of the document the

header tag (and corresponding

) the

paragraph-separator tag HTML tags consist of a left angle bracket (<), (a ``less than'' symbol to mathematicians), followed by name of the tag and closed by a right angular bracket (>). Tags are usually paired, e.g.

and

. The ending tag looks just like the starting tag except a slash (/) precedes the text within the brackets. In the example,

tells the Web browser to start formatting a level-one heading;

tells the browser that the heading is complete. The primary exception to the pairing rule is the

tag. There is no such thing as

. NOTE: HTML is not case sensitive. is equivalent to <TITLE> or <TiTlE>. Not all tags are supported by all World Wide Web browsers. If a browser does not support a tag, it just ignores it. Basic Markup Tags Title Every HTML document should have a title. A title is generally displayed separately from the document and is used primarily for document identification in other contexts (e.g., a WAIS search). Choose about half a dozen words that describe the document's purpose. In the X Window System and Microsoft Windows versions of NCSA Mosaic, the Document Title field is at the top of the screen just below the pulldown menus. In NCSA Mosaic for Macintosh, text tagged as <TITLE> appears as the window title. Headings HTML has six levels of headings, numbered 1 through 6, with 1 being the most prominent. Headings are displayed in larger and/or bolder fonts than normal body text. The first heading in each document should be tagged <H1>. The syntax of the heading tag is: <Hy>Text of heading </Hy > where y is a number between 1 and 6 specifying the level of the heading. For example, the coding for the ``Headings'' section heading above is <H3>Headings</H3> Title versus first heading In many documents, the first heading is identical to the title. For multipart documents, the text of the first heading should be suitable for a reader who is already browsing related information (e.g., a chapter title), while the title tag should identify the document in a wider context (e.g., include both the book title and the chapter title, although this can sometimes become overly long). Paragraphs Unlike documents in most word processors, carriage returns in HTML files aren't significant. Word wrapping can occur at any point in your source file, and multiple spaces are collapsed into a single space. (There are couple of exceptions; space following a <P> or <Hy> tag, for example, is ignored.) Notice that in the bare-bones example, the first paragraph is coded as Welcome to HTML. This is the first paragraph. <P> In the source file, there is a line break between the sentences. A Web browser ignores this line break and starts a new paragraph only when it reaches a <P> tag. Important: You must separate paragraphs with <P>. The browser ignores any indentations or blank lines in the source text. HTML relies almost entirely on the tags for formatting instructions, and without the <P> tags, the document becomes one large paragraph. (The exception is text tagged as ``preformatted,'' which is explained below.) For instance, the following would produce identical output as the first bare-bones HTML example: <TITLE>The simplest HTML example

This is a level one heading

Welcome to the world of HTML. This is one paragraph.

And this is a second.

However, to preserve readability in HTML files, headings should be on separate lines, and paragraphs should be separated by blank lines (in addition to the

tags). NCSA Mosaic handles

by ending the current paragraph and inserting a blank line. In HTML+, a successor to HTML currently in development,

becomes a ``container'' of text, just as the text of a level-one heading is ``contained'' within

...

:

This is a paragraph in HTML+.

The difference is that the

closing tag can always be omitted. (That is, if a browser sees a

, it knows that there must be an implied

to end the previous paragraph.) In other words, in HTML+,

is a beginning-of-paragraph marker. The advantage of this change is that you will be able to specify formatting options for a paragraph. For example, in HTML+, you will be able to center a paragraph by coding

This is a centered paragraph. This is HTML+, so you can't do it yet. This change won't effect any documents you write now, and they will continue to look just the same with HTML+ browsers. Linking to Other Documents The chief power of HTML comes from its ability to link regions of text (and also images) to another document. The browser highlights these regions (usually with color and/or underlines) to indicate that they are hypertext links (often shortened to hyperlinks or simply links). HTML's single hypertext-related tag is , which stands for anchor. To include an anchor in your document: 1. Start the anchor with 3. Enter the text that will serve as the hypertext link in the current document. 4. Enter the ending anchor tag: . Here is an sample hypertext reference: Maine This entry makes the word ``Maine'' the hyperlink to the document MaineStats.html, which is in the same directory as the first document. You can link to documents in other directories by specifying the relative path from the current document to the linked document. For example, a link to a file NJStats.html located in the subdirectory AtlanticStates would be: New Jersey These are called relative links. You can also use the absolute pathname of the file if you wish. Pathnames use the standard UNIX syntax. Relative Links Versus Absolute Pathnames In general, you should use relative links, because 1. You have less to type. 2. It's easier to move a group of documents to another location, because the relative path names will still be valid. However, use absolute pathnames when linking to documents that are not directly related. For example, consider a group of documents that comprise a user manual. Links within this group should be relative links. Links to other documents (perhaps a reference to related software) should use full path names. This way, if you move the user manual to a different directory, none of the links would have to be updated. Uniform Resource Locator The World Wide Web uses Uniform Resource Locators (URLs) to specify the location of files on other servers. A URL includes the type of resource being accessed (e.g., gopher, WAIS), the address of the server, and the location of the file. The syntax is: scheme://host.domain[:port]/path/filename where scheme is one of file a file on your local system, or a file on an anonymous FTP server http a file on a World Wide Web server gopher a file on a Gopher server WAIS a file on a WAIS server news an Usenet newsgroup telnet a connection to a Telnet-based service The port number can generally be omitted. (That means unless someone tells you otherwise, leave it out.) For example, to include a link to this primer in your document, you would use NCSA's Beginner's Guide to HTML This would make the text ``NCSA's Beginner's Guide to HTML'' a hyperlink to this document. For more information on URLs, look at WWW Names and Addresses, URIs, URLs, URNs, written by people at CERN A Beginner's Guide to URLs, located on the NCSA Mosaic Help menu Links to Specific Sections in Other Documents Anchors can also be used to move to a particular section in a document. Suppose you wish to set a link from document A and a specific section in document B. (Call this file documentB.html.) First you need to set up a named anchor in document B. For example, to set up an anchor named ``Jabberwocky'' to document B, enter Here's some text Now when you create the link in document A, include not only the filename, but also the named anchor, separated by a hash mark (#). This is my link to document B. Now clicking on the word ``link'' in document A sends the reader directly to the words ``some text'' in document B. Links to Specific Sections Within the Current Document The technique is exactly the same except the filename is omitted. For example, to link to the Jabberwocky anchor from within the same file (Document B), use This is Jabberwocky link from within Document B. Additional Markup Tags The preceding is sufficient to produce simple HTML documents. For more complex documents, HTML has tags for several types of lists, preformatted sections, extended quotations, character formatting, and other items. Lists HTML supports unnumbered, numbered, and definition lists. Unnumbered Lists To make an unnumbered list, 1. Start with an opening list

tag. Below an example two-item list: The output is: apples bananas The
  • items can contain multiple paragraphs. Just separate the paragraphs with the

    paragraph tags. Numbered Lists A numbered list (also called an ordered list, from which the tag name derives) is identical to an unnumbered list, except it uses

      instead of