The table of contents at the top of this page is being generated dynamically using the W3C Document Object Model. This demonstration runs in Internet Explorer 5.0 and Netscape's Gecko Developer Preview.
In our last article we explained two techniques for automatically creating a table of contents - one focusing only on Internet Explorer's 4.0 DHTML model and another for Internet Explorer 5.0 using a combination of the Internet Explorer approach and the W3C recommendation. this article, we continue with a third technique for generating a table of contents based only on the W3C DOM recommendation and the ECMA-262 (JavaScript) standard. When writing this script, we added two additional requirements - the script must run in Internet Explorer 5.0 and Netscape's Gecko M4 Developer Preview.
Rewriting the table of contents to be completely standards based proved to be a challenge. Surprisingly more of this challenge was related to complexities of the W3C recommendation rather than compatibility issues between IE5 and Netscape Gecko.
The W3C DOM focuses on exposing node objects that represent the HTML tree. Manipulating the document requires you to navigate this hierarchy of objects. On the next page we introduce you to the W3C DOM properties and methods used to implement our table of contents scripts.
The W3C recommendation defines a number of methods for manipulating the document as a hierarchy of nodes. A node can be either an HTML element or fragment of text (there are other node types such as processing instructions, comments, etc, but they are not necessary for this example). Below we list the properties and methods used in this article:
| document.createElement(sTagName) | Creates and returns an element object for the specified tag name. |
| document.createTextNode(sData) | Creates and returns a TextNode object with the specified contents. |
| document.getElementsByTagName(sTagName) | Locates and returns a collection with
all the specified elements in the document. A special wildcard value, "*", is defined
for returning all the elements in the document. This method is analogous to the
document.all.tags(sTagName) method defined by Internet Explorer and
the wildcard is the same as the Internet Explorer document.all collection. |
| element.childNodes() | Returns the collection of child nodes. This collection is similar to the children collection defined in Internet Explorer. The primary difference is the childNodes collection also contains TextNode objects. These are objects that represent that actual content within each element. |
| node.insertBefore(node,nodePosition) | The insertBefore method is used to insert nodes as children of the current element. The nodePosition is the node to insert the the new node object before. This method is often used to manipulate nodes created with the document's createElement and createTextNode methods into the document. |
| node.appendChild(node) | A simplified version of insertBefore that automatically insert's the node as the last child of an element. |
| node.nodeType | A read-only property that returns the type of node. The two most common values are 1 for element objects and 3 for text nodes. |
All of these methods are supported by Internet Explorer 5.0 and Netscape's Gecko Developer's preview. These methods are a sampling of the object model defined by the W3C recommendation and are the features we use to create a cross-browser standards-compliant table of contents.
For the most part, these methods interoperate between the Internet Explorer 5.0 and Netscape Gecko. One difference we will discuss later is Internet Explorer's incomplete support for getElementsByTagName(). IE5 is missing support for the wild card value. However, this is mostly a minor issue as we can easiily work-around this by dynamically adding support to IE5 for this method.
Our first task is to extract all the header (H1...H6) elements in the document. The
Internet Explorer model makes this extremely simple. In Internet Explorer the
document can be represented as a tree or a flattened collection of elements. The
flattened collection exposes easy access all elements through the all collection. Through this collection
we can easily extract all the header elements.
The W3C recommendation exposes the document primarily as a tree. In addition, a convenience method is exposed, getElementsByTagName, that can retrieve all the elements of a particular type in the document or all elements using a special wildcard identifier ("*"). Unfortunately, while IE5 supports this method, it does not support the wildcard value for returning all elements.
At this point, we can ignore IE's lack of support and recursively navigate the tree of elements to find all the header elements or we can override IE's support for getElementsByTagName with a fixed version from within JavaScript. (for more about recursion, see Rajeev's article on building a maze recursively).
If we don't want to include any browser detection code, we can
write our own function for locating the headers. This script is not simple
and requires understanding recursion. Below is a basic function that
visits each element node in the document. On the last page we include an enhanced
version of this function that locates just the header elements and builds the TOC
on the fly.
// Walk all elements - Recursive Standards-based
function getElements(obj) {
for (var i=0;i < obj.childNodes.length;i++)
if (obj.childNodes[i].nodeType==1) // Elements only
getElements(obj.childNodes[i])
}
getElements(document.childNodes[0])
Rather than deal with the complexity of this function, with a very simple script we can override IE5's incomplete support
for getElementsByTagName. A positive side-effect of this fix is we also add full support
for this method to Internet Explorer 4.0. With this small script we can make
IE5's implementation compatible with Netscape's. This also simplifies the script that
navigates to all elements. When examining the getElements() function below, notice that we no longer
need to call the getElements function recursively:
function ie_getElementsByTagName(str) {
// Map to the all collections
if (str=="*")
return document.all
else
return document.all.tags(str)
}
if (document.all)
document.getElementsByTagName = ie_getElementsByTagName
function getElements() {
var obj = document.getElementsByTagName("*")
for (var i=0;i < obj.length;i++)
var el = obj[i] // get the element
}
getElements()
The script for accessing all the elements is almost the same as
the script we would write using the original Internet Explorer model. The only difference
is we use the getElementsByTagName() method instead of the all collection. The next step is to
write the script so only the header elements are extracted.
We are going to continue with the simpler, non-recursive solution. We do provide the source code for both
solutions is provided at the end of this article.. Extracting the headers with getElementsByTagName() is simple.
We just examine all the element's in the document and check whether they are a header element:
function getHeaders() {
var obj = document.getElementsByTagName("*")
var tagList = "H1;H2;H3;H4;H5;H6;"
for (var i=0;i < obj.length;i++)
if (tagList.indexOf(obj[i].tagName+";")>=0) {
// Got One
}
}
We are now going to process each header and iteratively build the
table of content as an HTML list (UL). To create the
list container and each table of contents entry, we use the createElement() method.
As we build each entry, we will append it to the end of the list. When we are
finished scanning the document we will have a complete table of contents:
function getHeaders() {
var obj = document.getElementsByTagName("*")
var el = document.createElement("UL")
var tagList = "H1;H2;H3;H4;H5;H6;"
for (var i=0;i
We are almost there. While this script visits each element we left out one very important
function, getTextForElement() that extracts the text from each header element
and the code that inserts the table of contents into the document. With the Internet Explorer
object model, accessing the contents is very simple using the innerText property on the
element. The W3C model exposes no such property. To make matters more difficult, they
expose each fragment of text as separate objects in the tree. While this approach
is useful for some scenarios, it makes simple text retrieval much more difficult.
Unfortunately the W3C recommendation does not include any easy to use property or
function for obtaining the contents of an element. Instead, the contents are buried
beneath a text node object. For example, take the following simple HTML:
<P>This is a <EM>sample</EM> paragraph</P>
In the Internet Explorer model, the contents of this paragraph can be retrieved
using the innerText property of the P element. The W3C recommendation instead requires
you to manipulate each piece of text as a separate object. The above HTML fragment
is exposed as a tree of objects:
Element Object (P)
|
+--TextNode Object (This is a)
|
+--Element Object (EM)
| |
| +-- TextNode Object (sample)
|
+--TextNode Object(paragraph)
To retrieve the contents of this paragraph you need to traverse the object hierarchy
and extract the text in each text-node object. We wrote a recursive function, getTextForElement(), that
walks the object hierarchy for an element and returns the content:
function getTextForElement(obj) {
var str=""
for (var i=0;i < obj.childNodes.length;i++) {
if (obj.childNodes[i].nodeType==1)
// Element node - walk children
str+=getTextForElement(obj.childNodes[i])
else if (obj.childNodes[i].nodeType==3)
// Text Node - extract contents
str = obj.childNodes[i].data
}
return str
}
The last step is to insert the table of contents into the document. The getHeaders()
function returns a tree of objects representing the table of contents list. To insert
this tree into the document, we use the insertBefore() method. We
use this method to insert the table of contents as the first child of the document's body.
Below is the final doLoad() function that is called after the page loads.
function doLoad() {
var el = getHeaders()
var startEl = document.getElementsByTagName("BODY")[0]
startEl.insertBefore(el,startEl.childNodes[0])
}
Last, we provide the complete script for generating the table of contents. The table of contents created by this article is not interactive. You can't click on an entry and navigate to the element. We will explain how to make this table of contents interactive next week. Stay tuned!
Recursive getHeaders() Approach
function getTextForElement(obj) {
var str=""
for (var i=0;i < obj.childNodes.length;i++) {
if (obj.childNodes[i].nodeType==1)
str+=getTextForElement(obj.childNodes[i])
else if (obj.childNodes[i].nodeType==3)
str = obj.childNodes[i].data
}
return str
}
function getHeaders(el,obj) {
var tagList = "H1;H2;H3;H4;H5;H6;"
if (tagList.indexOf(obj.tagName+";")>=0) {
var eLI = document.createElement("LI")
var eLIText = document.createTextNode(getTextForElement(obj))
eLI.className="toc" + obj.tagName
eLI.appendChild(eLIText)
el.appendChild(eLI)
}
for (var i=0;i < obj.childNodes.length;i++)
if (obj.childNodes[i].nodeType==1)
getHeaders(el,obj.childNodes[i])
}
function doLoad() {
var el = document.createElement("UL")
var startEl = document.getElementsByTagName("BODY")[0]
getHeaders(el,startEl)
startEl.insertBefore(el,startEl.childNodes[0])
}
window.onload =doLoad
getElementsByTagName() Approach
function getTextForElement(obj) {
var str=""
for (var i=0;i < obj.childNodes.length;i++) {
if (obj.childNodes[i].nodeType==1)
str+=getTextForElement(obj.childNodes[i])
else if (obj.childNodes[i].nodeType==3)
str = obj.childNodes[i].data
}
return str
}
function getHeaders() {
var obj = document.getElementsByTagName("*")
var el = document.createElement("UL")
var tagList = "H1;H2;H3;H4;H5;H6;"
for (var i=0;i < obj.length;i++)
if (tagList.indexOf(obj[i].tagName+";")>=0) {
var eLI = document.createElement("LI")
var eLIText = document.createTextNode(getTextForElement(obj[i]))
eLI.className="toc" + obj[i].tagName
eLI.appendChild(eLIText)
el.appendChild(eLI)
}
return el
}
function ie_getElementsByTagName(str) {
if (str=="*")
return document.all
else
return document.all.tags(str)
}
if (document.all)
document.getElementsByTagName = ie_getElementsByTagName
function doLoad() {
var el = getHeaders()
var startEl = document.getElementsByTagName("BODY")[0]
startEl.insertBefore(el,startEl.childNodes[0])
}
window.onload =doLoad