Standards-based Table of Contents

Return to Article | Home

Demonstration

The table of contents at the top of this page is being generated dynamically using the W3C Document Object Model. This demonstration runs in Internet Explorer 5.0 and Netscape's Gecko Developer Preview.

Introduction

In our last article we explained two techniques for automatically creating a table of contents - one focusing only on Internet Explorer's 4.0 DHTML model and another for Internet Explorer 5.0 using a combination of the Internet Explorer approach and the W3C recommendation. this article, we continue with a third technique for generating a table of contents based only on the W3C DOM recommendation and the ECMA-262 (JavaScript) standard. When writing this script, we added two additional requirements - the script must run in Internet Explorer 5.0 and Netscape's Gecko M4 Developer Preview.

Rewriting the table of contents to be completely standards based proved to be a challenge. Surprisingly more of this challenge was related to complexities of the W3C recommendation rather than compatibility issues between IE5 and Netscape Gecko.

The W3C DOM focuses on exposing node objects that represent the HTML tree. Manipulating the document requires you to navigate this hierarchy of objects. On the next page we introduce you to the W3C DOM properties and methods used to implement our table of contents scripts.

W3C DOM properties and methods

The W3C recommendation defines a number of methods for manipulating the document as a hierarchy of nodes. A node can be either an HTML element or fragment of text (there are other node types such as processing instructions, comments, etc, but they are not necessary for this example). Below we list the properties and methods used in this article:
document.createElement(sTagName)Creates and returns an element object for the specified tag name.
document.createTextNode(sData)Creates and returns a TextNode object with the specified contents.
document.getElementsByTagName(sTagName)Locates and returns a collection with all the specified elements in the document. A special wildcard value, "*", is defined for returning all the elements in the document. This method is analogous to the document.all.tags(sTagName) method defined by Internet Explorer and the wildcard is the same as the Internet Explorer document.all collection.
element.childNodes()Returns the collection of child nodes. This collection is similar to the children collection defined in Internet Explorer. The primary difference is the childNodes collection also contains TextNode objects. These are objects that represent that actual content within each element.
node.insertBefore(node,nodePosition) The insertBefore method is used to insert nodes as children of the current element. The nodePosition is the node to insert the the new node object before. This method is often used to manipulate nodes created with the document's createElement and createTextNode methods into the document.
node.appendChild(node) A simplified version of insertBefore that automatically insert's the node as the last child of an element.
node.nodeTypeA read-only property that returns the type of node. The two most common values are 1 for element objects and 3 for text nodes.

All of these methods are supported by Internet Explorer 5.0 and Netscape's Gecko Developer's preview. These methods are a sampling of the object model defined by the W3C recommendation and are the features we use to create a cross-browser standards-compliant table of contents.

For the most part, these methods interoperate between the Internet Explorer 5.0 and Netscape Gecko. One difference we will discuss later is Internet Explorer's incomplete support for getElementsByTagName(). IE5 is missing support for the wild card value. However, this is mostly a minor issue as we can easiily work-around this by dynamically adding support to IE5 for this method.

Locating the Header Elements

Our first task is to extract all the header (H1...H6) elements in the document. The Internet Explorer model makes this extremely simple. In Internet Explorer the document can be represented as a tree or a flattened collection of elements. The flattened collection exposes easy access all elements through the all collection. Through this collection we can easily extract all the header elements.

The W3C recommendation exposes the document primarily as a tree. In addition, a convenience method is exposed, getElementsByTagName, that can retrieve all the elements of a particular type in the document or all elements using a special wildcard identifier ("*"). Unfortunately, while IE5 supports this method, it does not support the wildcard value for returning all elements.

At this point, we can ignore IE's lack of support and recursively navigate the tree of elements to find all the header elements or we can override IE's support for getElementsByTagName with a fixed version from within JavaScript. (for more about recursion, see Rajeev's article on building a maze recursively).

If we don't want to include any browser detection code, we can write our own function for locating the headers. This script is not simple and requires understanding recursion. Below is a basic function that visits each element node in the document. On the last page we include an enhanced version of this function that locates just the header elements and builds the TOC on the fly.

// Walk all elements - Recursive Standards-based
function getElements(obj) {
 for (var i=0;i < obj.childNodes.length;i++)
  if (obj.childNodes[i].nodeType==1) // Elements only
    getElements(obj.childNodes[i])
}

getElements(document.childNodes[0])

Rather than deal with the complexity of this function, with a very simple script we can override IE5's incomplete support for getElementsByTagName. A positive side-effect of this fix is we also add full support for this method to Internet Explorer 4.0. With this small script we can make IE5's implementation compatible with Netscape's. This also simplifies the script that navigates to all elements. When examining the getElements() function below, notice that we no longer need to call the getElements function recursively:

function ie_getElementsByTagName(str) {
 // Map to the all collections
 if (str=="*")
  return document.all
 else
  return document.all.tags(str)
}

if (document.all)
 document.getElementsByTagName = ie_getElementsByTagName

function getElements() {
 var obj = document.getElementsByTagName("*")
 for (var i=0;i < obj.length;i++)
  var el = obj[i]	// get the element
}

getElements()

The script for accessing all the elements is almost the same as the script we would write using the original Internet Explorer model. The only difference is we use the getElementsByTagName() method instead of the all collection. The next step is to write the script so only the header elements are extracted.

We are going to continue with the simpler, non-recursive solution. We do provide the source code for both solutions is provided at the end of this article.. Extracting the headers with getElementsByTagName() is simple. We just examine all the element's in the document and check whether they are a header element:

function getHeaders() {
 var obj = document.getElementsByTagName("*")
 var tagList = "H1;H2;H3;H4;H5;H6;"
 for (var i=0;i < obj.length;i++)
  if (tagList.indexOf(obj[i].tagName+";")>=0) {
    // Got One
  }
}

Build the Table of Contents

We are now going to process each header and iteratively build the table of content as an HTML list (UL). To create the list container and each table of contents entry, we use the createElement() method. As we build each entry, we will append it to the end of the list. When we are finished scanning the document we will have a complete table of contents:

function getHeaders() {
 var obj = document.getElementsByTagName("*")
 var el = document.createElement("UL")
 var tagList = "H1;H2;H3;H4;H5;H6;"
 for (var i=0;i=0) {
   var eLI = document.createElement("LI")
   // Assign the text - more on this soon
   var eLIText = document.createTextNode(getTextForElement(obj[i]))
   eLI.className="toc" + obj[i].tagName
   eLI.appendChild(eLIText)
   el.appendChild(eLI)
  }
 return el
}


function ie_getElementsByTagName(str) {
 if (str=="*")
  return document.all
 else
  return document.all.tags(str)
}

if (document.all)
 document.getElementsByTagName = ie_getElementsByTagName

function doLoad() {
 var el= getHeaders()
}

window.onload =doLoad

We are almost there. While this script visits each element we left out one very important function, getTextForElement() that extracts the text from each header element and the code that inserts the table of contents into the document. With the Internet Explorer object model, accessing the contents is very simple using the innerText property on the element. The W3C model exposes no such property. To make matters more difficult, they expose each fragment of text as separate objects in the tree. While this approach is useful for some scenarios, it makes simple text retrieval much more difficult.

Retrieving an Element's Content

Unfortunately the W3C recommendation does not include any easy to use property or function for obtaining the contents of an element. Instead, the contents are buried beneath a text node object. For example, take the following simple HTML:

<P>This is a <EM>sample</EM> paragraph</P>

In the Internet Explorer model, the contents of this paragraph can be retrieved using the innerText property of the P element. The W3C recommendation instead requires you to manipulate each piece of text as a separate object. The above HTML fragment is exposed as a tree of objects:

Element Object (P)
  |
  +--TextNode Object (This is a)
  |
  +--Element Object (EM)
  |      |
  |      +-- TextNode Object (sample)
  |
  +--TextNode Object(paragraph)

To retrieve the contents of this paragraph you need to traverse the object hierarchy and extract the text in each text-node object. We wrote a recursive function, getTextForElement(), that walks the object hierarchy for an element and returns the content:

function getTextForElement(obj) {
 var str=""
 for (var i=0;i < obj.childNodes.length;i++) {
  if (obj.childNodes[i].nodeType==1) 
   // Element node - walk children
   str+=getTextForElement(obj.childNodes[i])
  else if (obj.childNodes[i].nodeType==3)
   // Text Node - extract contents
   str = obj.childNodes[i].data
 }
 return str	
}

The last step is to insert the table of contents into the document. The getHeaders() function returns a tree of objects representing the table of contents list. To insert this tree into the document, we use the insertBefore() method. We use this method to insert the table of contents as the first child of the document's body. Below is the final doLoad() function that is called after the page loads.

function doLoad() {
 var el = getHeaders()
 var startEl = document.getElementsByTagName("BODY")[0]
 startEl.insertBefore(el,startEl.childNodes[0])
}

The Script

Last, we provide the complete script for generating the table of contents. The table of contents created by this article is not interactive. You can't click on an entry and navigate to the element. We will explain how to make this table of contents interactive next week. Stay tuned!

Recursive getHeaders() Approach

function getTextForElement(obj) {
 var str=""
 for (var i=0;i < obj.childNodes.length;i++) {
  if (obj.childNodes[i].nodeType==1)
   str+=getTextForElement(obj.childNodes[i])
  else if (obj.childNodes[i].nodeType==3)
   str = obj.childNodes[i].data
 }
 return str
}

function getHeaders(el,obj) {
 var tagList = "H1;H2;H3;H4;H5;H6;"
 if (tagList.indexOf(obj.tagName+";")>=0) {
  var eLI = document.createElement("LI")
  var eLIText = document.createTextNode(getTextForElement(obj))
  eLI.className="toc" + obj.tagName
  eLI.appendChild(eLIText)
  el.appendChild(eLI)
 }
 for (var i=0;i < obj.childNodes.length;i++)
  if (obj.childNodes[i].nodeType==1)
   getHeaders(el,obj.childNodes[i])
}


function doLoad() {
 var el = document.createElement("UL")
 var startEl = document.getElementsByTagName("BODY")[0]
 getHeaders(el,startEl)
 startEl.insertBefore(el,startEl.childNodes[0])
}

window.onload =doLoad

getElementsByTagName() Approach


function getTextForElement(obj) {
 var str=""
 for (var i=0;i < obj.childNodes.length;i++) {
  if (obj.childNodes[i].nodeType==1)
   str+=getTextForElement(obj.childNodes[i])
  else if (obj.childNodes[i].nodeType==3)
   str = obj.childNodes[i].data
 }
 return str
}

function getHeaders() {
 var obj = document.getElementsByTagName("*")
 var el = document.createElement("UL")
 var tagList = "H1;H2;H3;H4;H5;H6;"
 for (var i=0;i < obj.length;i++)
  if (tagList.indexOf(obj[i].tagName+";")>=0) {
   var eLI = document.createElement("LI")
   var eLIText = document.createTextNode(getTextForElement(obj[i]))
   eLI.className="toc" + obj[i].tagName
   eLI.appendChild(eLIText)
   el.appendChild(eLI)
  }
 return el
}

function ie_getElementsByTagName(str) {
 if (str=="*")
  return document.all
 else
  return document.all.tags(str)
}

if (document.all)
 document.getElementsByTagName = ie_getElementsByTagName

function doLoad() {
 var el = getHeaders()
 var startEl = document.getElementsByTagName("BODY")[0]
 startEl.insertBefore(el,startEl.childNodes[0])
}

window.onload =doLoad