Sybernet / Supplied Procedures Reference
Release 3.00
Aug 23, 2003
backwards forwards

HTTP.SP_HTML_VALIDATOR

The Sybernet HTML Validator attempts to parse your HTML documents to make sure they conform (or are compliant with) the specification defined by the World Wide Web Consortium. If possible, the validator will attempt to provide an alternate syntax that is in compliance. Most of the time this is not possible. Your only option is to review the documentation for the HTML 4.01 Specification which includes information on:



A valid HTML document declares what version of HTML is used in the document. The document type declaration names the document type definition (DTD) in use for the document. HTML 4.01 specifies three DTDs, so authors must include one of the following document type declarations in their documents. The DTDs vary in the elements they support:



The validator exists on our development server only. Since the tracing mechanism adds overhead to every application (even if it isn't being traced), the validator does not exist on any other server.

Navigate to Sybernet Utilities and click the link for Validator II. Doing so opens a new window. Use this new window to turn tracing on and off and to to validate your HTML output. Use the original window to navigate to the procedure you wish to validate. Since the validator uses your session ID to capture output, it is not possible to capture output from any other session. This means you must use the same PC to validate and run your applications.

If you are ambitious, you may capture several screens at once. This might be a form screen and the report that it creates. When the file is validated, however, there is no demarcation between the two except for the normal HTML tags: a good argument to always include <html> and </html> pairs in each document you create. The validator can also become confused when scanning multiple documents because the opening and closing tags for html, head, and body are optional. This might generate an error when none was expected.

XHTML vs HTML 4.01

Although the validator does not currently parse XHTML documents, you may be interested in some of the differences between it and HTML 4.01.

Deprecated values in HTML 4.01 appear to be deprecated (as opposed to being obsolete) in XHTML. Specifications exist for the same flavors of HTML 4.01: strict, transitional, and frameset.

Well-formedness is a new concept introduced by [XML]. Essentially this means that all elements must either have closing tags or be written in a special form, and that all the elements must nest properly. Element and attribute names must be in lower case. For non-empty elements, end tags are required. Attribute values must always be quoted. XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as compact and checked cannot occur in elements without their value being specified. Empty elements must either have an end tag or the start tag must end with />.


Controls

Use the following controls to capture and validate your HTML documents.

Control Description
Trace On
 

The button turns tracing on. When on, the validator captures all output from Sybernet for your current session. The only output that it won't capture is HTML generated by itself.

Trace Off
 

This button turns tracing off.

Purge
 

This button removes your trace file. You do not have to click purge between traces since the Trace On button also removes your trace file. But when you are done validating your HTML, you will want to purge this file from disk.

Validate
 

This button validates your HTML file using the document type you specified. Only errors are displayed in this mode.

Validate Verbose
 

Validate Verbose is the same as Validate except that all tags in your document are displayed. Source text between each tag (and in comments) is not shown.

Document Type
 

This drop-down list determines the level of the HTML document that is to be validated. See below for a description of each option.



Document Types

The following options determine the level of your HTML document. Every document should at least pass the sanity check. Of the remaining options, HTML 4.01 (frameset) is the desired document type (since most of us use frames).

One assumes that frames are (or are going to be) deprecated in future document types. This is evident in the definition of HTML 4.01 (strict) which does not allow frames. Choosing this option will also flag deprecated elements and attributes. I'm sad to say everything cool has been deprecated in favor of style sheets.


Document Type Description
(detect automatically)
 
Every valid HTML document declares what version of HTML is used to describe your document. The auto-detect mechanism of the Validator assumes that each document contains a valid HTML Version (or DOCTYPE). When specified, this option looks for and parses each document in the version specified. Because the Sybernet Validator might trace several document types in the same session, this is the safest way to insure that each document is parsed correctly. That is, there is no confusion if the <html>, <head>, and <body> tags are implied.

HTML 1.0 (sanity-check)
 

HTML 1.0 is not defined by the World Wide Web Consortium. This option performs a "sanity check" to make sure it recognizes each tag and that the tags nest properly. Element attributes are ignored in this mode.

If you fail this pass, there's probably no reason to continue with any of the other document types.

HTML 3.2
 

HTML 3.2 adds widely deployed features such as tables, applets and text flow around images, while providing full backwards compatibility with the existing standard HTML 2.0.

HTML 4.01 (strict)
 

The HTML 4.01 Strict DTD includes all elements and attributes that have not been deprecated or do not appear in frameset documents.

HTML 4.01 (transitional)
 

The HTML 4.01 Transitional DTD includes everything in the strict DTD plus deprecated elements and attributes (most of which concern visual presentation).

HTML 4.01 (frameset)
 

The HTML 4.01 Frameset DTD includes everything in the transitional DTD plus frames as well.




Inline vs Block-Level Elements

Generally, block-level elements may contain inline elements and other block-level elements. Generally, inline elements may contain only data and other inline elements. Inherent in this structural distinction is the idea that block elements create "larger" structures than inline elements.

Consider the following illegal example:

    <a href="Sybernet.cgi$HTTP.SP_HTML_VALIDATOR2">
    <h2>
    Validator
    </h2>
    </a>

The A tag is an inline element and may not contain block-level elements. The validator flags this construct with the following message:

    <h2> (a block-level element) cannot appear as a child element of <a> which is an inline element.

The P element represents a paragraph. It cannot contain block-level elements (including P itself). The W3C discourages authors from using empty P elements. User agents should ignore empty P elements.

Consider the following illegal example:

    <p>
    <table>
    </table>
    </p>

Because the closing P tag is optional, the UA implicitly closes this tag before starting the Table, making your explicit close an error.

    <p>
    </p>
    <table>
    </table>
    </p>

The validator has trouble trapping this error. In XHTML the closing P tag is required so this isn't a problem.


Bugs (Known and Unknown)

The validator treats the form tag like any other tag. If your <form> tag occurs within a table cell (td or th) and closed outside of this tag, the validator will complain that you did not close this withn the table cell. You can ignore this error or move your <form> tag to the same level (nesting) as your </form> tag. XHTML (I suspect) probably doesn't like this either.

It is possible that the validator does not recognize a valid option or its allowed value. If you can find such a bug, please let me know so I can update the table that drives the validator. Some errors may surprise you because they work as expected (and documented) in NS4, but if they are not defined by the World Wide Web Consortium, they are considered an error in your document.



Example

The following example illustrates the output from a sanity check:

HTML 1.0 (sanity-check)

00001 <center> inside of <table> but not inside of <td> or <th>
00001 <u> inside of <table> but not inside of <td> or <th>
00001 <table> inside of <table> but not inside of <td> or <th>
00001 </td> not inside of <td>
00001 </td> not inside of <td>
00001 <b> inside of <b>
00001 </td> not inside of <td>
00001 </td> not inside of <td>
00001 </td> not inside of <td>
00001 </body> does not match prior tag of <center>
00001 </form> does not match prior tag of <body>

11 errors (or warnings).

This is probably one of my most favorite examples. This application renders perfectly in NS4. In NS7 the form screen is not centered, and in MSIE there appears to be a very large horizontal bar in the middle of the window. There is no reason to worry about being compliant to any version of HTML until the above errors are corrected.

What appears to be line numbers to the left of each message are in fact line numbers. This will happen if you don't go out of your way to insert line-feeds.


Example

A snippet of the above in verbose mode:

HTML 1.0 (sanity-check)

<FORM NAME="myForm" METHOD="POST" ACTION="Sybernet.cgi">
<INPUT TYPE="hidden" NAME="PROCEDURE" VALUE="DELTEK.PBD280_report">
<INPUT TYPE="hidden" NAME="DATABASE" VALUE="DELTEK">
<TABLE BORDER=10 WIDTH=25%>
  <body text="#FFFF00" bgcolor="#993366" link="#0000EE" vlink="#551A8B" alink="#FF0000">
  <center>
00001 <center> inside of <table> but not inside of <td> or <th>
  <font size=+3>
  </font>
  <br>
  <u>
00001 <u> inside of <table> but not inside of <td> or <th>
  <font size=+2>
  </font>
  </u>
  <br>
  <BR>
  <BR>
  <TABLE BORDER=0>
00001 <table> inside of <table> but not inside of <td> or <th>
    <TR>
...

11 errors (or warnings).



Example

A snippet of the above in verbose mode using HTML 4.01 (frameset):

HTML 4.01 (frameset)

<FORM NAME="myForm" METHOD="POST" ACTION="Sybernet.cgi">
<INPUT TYPE="hidden" NAME="PROCEDURE" VALUE="DELTEK.PBD280_report">
<INPUT TYPE="hidden" NAME="DATABASE" VALUE="DELTEK">
<TABLE BORDER=10 WIDTH=25%>
00001 An attribute value must be a literal unless it contains only name characters: 25%
  <body text="#FFFF00" bgcolor="#993366" link="#0000EE" vlink="#551A8B" alink="#FF0000">
  <center>
00001 <center> inside of <table> but not inside of <td> or <th>
  <font size=+3>
00001 An attribute value must be a literal unless it contains only name characters: +3
  </font>
  <br>
  <u>
00001 <u> inside of <table> but not inside of <td> or <th>
  <font size=+2>
00001 An attribute value must be a literal unless it contains only name characters: +2
  </font>
  </u>
  <br>
  <BR>
  <BR>
  <TABLE BORDER=0>
00001 <table> inside of <table> but not inside of <td> or <th>
    <TR>
...

11 errors (or warnings).

Almost the same except in this mode the validator is also looking at your tag attributes. It will, for example, complain that you said valign=center when what you really meant was valign=middle. In the above, your input field passed with flying colors: type, name, and value are all legal attributes to the input tag, and the value specified is correct for each.

What it does complain about is that you did not quote some literal values. Unquoted values may only contain letters, numbers, and dashes. I used to quote such objects, but gave that up because Netscape didn't care, and I could save (perhaps) several hundred bytes by omitting them.




Example

A snippet of the above in verbose mode using HTML 4.01 (strict):

HTML 4.01 (strict)

<FORM NAME="myForm" METHOD="POST" ACTION="Sybernet.cgi">
<INPUT TYPE="hidden" NAME="PROCEDURE" VALUE="DELTEK.PBD280_report">
<INPUT TYPE="hidden" NAME="DATABASE" VALUE="DELTEK">
<TABLE BORDER=10 WIDTH=25%>
00001 An attribute value must be a literal unless it contains only name characters: 25%
  <body text="#FFFF00" bgcolor="#993366" link="#0000EE" vlink="#551A8B" alink="#FF0000">
00001 text is deprecated in <body>
00001 bgcolor is deprecated in <body>
00001 link is deprecated in <body>
00001 vlink is deprecated in <body>
00001 alink is deprecated in <body>
  <center>
00001 <center> inside of <table> but not inside of <td> or <th>
00001 <center> is deprecated
  <font size=+3>
00001 <font> is deprecated
  </font>
  <br>
  <u>
00001 <u> inside of <table> but not inside of <td> or <th>
00001 <u> is deprecated
  <font size=+2>
00001 <font> is deprecated
  </font>
  </u>
  <br>
  <BR>
  <BR>
  <TABLE BORDER=0>
00001 <table> inside of <table> but not inside of <td> or <th>
    <TR>
...

11 errors (or warnings)

So in conclusion what used to be a beautiful and simple language was sent to committee where it was made to conform to [now for] something completely different.

See Also

STDIO
VALIDATOR



Sybernet is a trademark of SRI International.
Copyright © 1996-2009 SRI International. All Rights Reserved.
Denis D. Workman / http://Sybernet.sri.com/