The table of contents at the top of this page is being generated dynamically using the W3C Document Object Model. This demonstration runs in Internet Explorer 5.0 and Netscape's Gecko Developer Preview.
In our last article we explained two techniques for automatically creating a table of contents - one focusing only on Internet Explorer's 4.0 DHTML model and another for Internet Explorer 5.0 using a combination of the Internet Explorer approach and the W3C recommendation. this article, we continue with a third technique for generating a table of contents based only on the W3C DOM recommendation and the ECMA-262 (JavaScript) standard. When writing this script, we added two additional requirements - the script must run in Internet Explorer 5.0 and Netscape's Gecko M4 Developer Preview.
Rewriting the table of contents to be completely standards based proved to be a challenge. Surprisingly more of this challenge was related to complexities of the W3C recommendation rather than compatibility issues between IE5 and Netscape Gecko.
The W3C DOM focuses on exposing node objects that represent the HTML tree. Manipulating the document requires you to navigate this hierarchy of objects. On the next page we introduce you to the W3C DOM properties and methods used to implement our table of contents scripts.
The W3C recommendation defines a number of methods for manipulating the document as a hierarchy of nodes. A node can be either an HTML element or fragment of text (there are other node types such as processing instructions, comments, etc, but they are not necessary for this example). Below we list the properties and methods used in this article:
| document.createElement(sTagName) | Creates and returns an element object for the specified tag name. |
| document.createTextNode(sData) | Creates and returns a TextNode object with the specified contents. |
| document.getElementsByTagName(sTagName) | Locates and returns a collection with
all the specified elements in the document. A special wildcard value, "*", is defined
for returning all the elements in the document. This method is analogous to the
document.all.tags(sTagName) method defined by Internet Explorer and
the wildcard is the same as the Internet Explorer document.all collection. |
| element.childNodes() | Returns the collection of child nodes. This collection is similar to the children collection defined in Internet Explorer. The primary difference is the childNodes collection also contains TextNode objects. These are objects that represent that actual content within each element. |
| node.insertBefore(node,nodePosition) | The insertBefore method is used to insert nodes as children of the current element. The nodePosition is the node to insert the the new node object before. This method is often used to manipulate nodes created with the document's createElement and createTextNode methods into the document. |
| node.appendChild(node) | A simplified version of insertBefore that automatically insert's the node as the last child of an element. |
| node.nodeType | A read-only property that returns the type of node. The two most common values are 1 for element objects and 3 for text nodes. |
All of these methods are supported by Internet Explorer 5.0 and Netscape's Gecko Developer's preview. These methods are a sampling of the object model defined by the W3C recommendation and are the features we use to create a cross-browser standards-compliant table of contents.
For the most part, these methods interoperate between the Internet Explorer 5.0 and Netscape Gecko. One difference we will discuss later is Internet Explorer's incomplete support for getElementsByTagName(). IE5 is missing support for the wild card value. However, this is mostly a minor issue as we can easiily work-around this by dynamically adding support to IE5 for this method.
Our first task is to extract all the header (H1...H6) elements in the document. The
Internet Explorer model makes this extremely simple. In Internet Explorer the
document can be represented as a tree or a flattened collection of elements. The
flattened collection exposes easy access all elements through the all collection. Through this collection
we can easily extract all the header elements.
The W3C recommendation exposes the document primarily as a tree. In addition, a convenience method is exposed, getElementsByTagName, that can retrieve all the elements of a particular type in the document or all elements using a special wildcard identifier ("*"). Unfortunately, while IE5 supports this method, it does not support the wildcard value for returning all elements.
At this point, we can ignore IE's lack of support and recursively navigate the tree of elements to find all the header elements or we can override IE's support for getElementsByTagName with a fixed version from within JavaScript. (for more about recursion, see Rajeev's article on building a maze recursively).
If we don't want to include any browser detection code, we can
write our own function for locating the headers. This script is not simple
and requires understanding recursion. Below is a basic function that
visits each element node in the document. On the last page we include an enhanced
version of this function that locates just the header elements and builds the TOC
on the fly.
// Walk all elements - Recursive Standards-based
function getElements(obj) {
for (var i=0;i < obj.childNodes.length;i++)
if (obj.childNodes[i].nodeType==1) // Elements only
getElements(obj.childNodes[i])
}
getElements(document.childNodes[0])
Rather than deal with the complexity of this function, with a very simple script we can override IE5's incomplete support
for getElementsByTagName. A positive side-effect of this fix is we also add full support
for this method to Internet Explorer 4.0. With this small script we can make
IE5's implementation compatible with Netscape's. This also simplifies the script that
navigates to all elements. When examining the getElements() function below, notice that we no longer
need to call the getElements function recursively:
function ie_getElementsByTagName(str) {
// Map to the all collections
if (str=="*")
return document.all
else
return document.all.tags(str)
}
if (document.all)
document.getElementsByTagName = ie_getElementsByTagName
function getElements() {
var obj = document.getElementsByTagName("*")
for (var i=0;i < obj.length;i++)
var el = obj[i] // get the element
}
getElements()
The script for accessing all the elements is almost the same as
the script we would write using the original Internet Explorer model. The only difference
is we use the getElementsByTagName() method instead of the all collection. The next step is to
write the script so only the header elements are extracted.
We are going to continue with the simpler, non-recursive solution. We do provide the source code for both
solutions is provided at the end of this article.. Extracting the headers with getElementsByTagName() is simple.
We just examine all the element's in the document and check whether they are a header element:
function getHeaders() {
var obj = document.getElementsByTagName("*")
var tagList = "H1;H2;H3;H4;H5;H6;"
for (var i=0;i < obj.length;i++)
if (tagList.indexOf(obj[i].tagName+";")>=0) {
// Got One
}
}
We are now going to process each header and iteratively build the
table of content as an HTML list (UL). To create the
list container and each table of contents entry, we use the createElement() method.
As we build each entry, we will append it to the end of the list. When we are
finished scanning the document we will have a complete table of contents:
function getHeaders() {
var obj = document.getElementsByTagName("*")
var el = document.createElement("UL")
var tagList = "H1;H2;H3;H4;H5;H6;"
for (var i=0;i
We are almost there. While this script visits each element we left out one very important
function, getTextForElement() that extracts the text from each header element
and the code that inserts the table of contents into the document. With the Internet Explorer
object model, accessing the contents is very simple using the innerText property on the
element. The W3C model exposes no such property. To make matters more difficult, they
expose each fragment of text as separate objects in the tree. While this approach
is useful for some scenarios, it makes simple text retrieval much more difficult.
Unfortunately the W3C recommendation does not include any easy to use property or
function for obtaining the contents of an element. Instead, the contents are buried
beneath a text node object. For example, take the following simple HTML:
<P>This is a <EM>sample</EM> paragraph</P>
In the Internet Explorer model, the contents of this paragraph can be retrieved
using the innerText property of the P element. The W3C recommendation instead requires
you to manipulate each piece of text as a separate object. The above HTML fragment
is exposed as a tree of objects:
Element Object (P)
|
+--TextNode Object (This is a)
|
+--Element Object (EM)
| |
| +-- TextNode Object (sample)
|
+--TextNode Object(paragraph)
To retrieve the contents of this paragraph you need to traverse the object hierarchy
and extract the text in each text-node object. We wrote a recursive function, getTextForElement(), that
walks the object hierarchy for an element and returns the content:
function getTextForElement(obj) {
var str=""
for (var i=0;i < obj.childNodes.length;i++) {
if (obj.childNodes[i].nodeType==1)
// Element node - walk children
str+=getTextForElement(obj.childNodes[i])
else if (obj.childNodes[i].nodeType==3)
// Text Node - extract contents
str = obj.childNodes[i].data
}
return str
}
The last step is to insert the table of contents into the document. The getHeaders()
function returns a tree of objects representing the table of contents list. To insert
this tree into the document, we use the insertBefore() method. We
use this method to insert the table of contents as the first child of the document's body.
Below is the final doLoad() function that is called after the page loads.
function doLoad() {
var el = getHeaders()
var startEl = document.getElementsByTagName("BODY")[0]
startEl.insertBefore(el,startEl.childNodes[0])
}
In Part II we continue our look at dynamically creating a table of contents using the W3C Document Object Model recommendation. In our first article (recommended reading for this article), we showed how to enumerate all the headers in the document and insert a static table of contents at the top of the document. We introduced you to the W3C methods for manipulating the HTML tree. We are now going to enhance our original script and make the table of contents live. When you click on an item in the table of contents, you will automatically be taken to the contents location in the document.
As in our original article, our approach uses W3C recommended standards and runs in Internet Explorer 5.0 and Netscape's Gecko Developer preview demonstrating that cross-browser standards-based scripting is becoming a possibility.
Enhancing our original script is extremely simple. We make the table of contents live by inserting anchors at each header (eg., <A ID="destination"></A>). We also wrap each table of contents entry with a link to the newly created anchor (eg., <A HREF="#destination">).
In our original script, the getHeaders() function did the bulk of the work. This function enumerates all the headers in the document and builds the table of contents as a nested list:
function getHeaders() {
var obj = document.getElementsByTagName("*")
var el = document.createElement("UL")
var tagList = "H1;H2;H3;H4;H5;H6;"
for (var i=0;i < obj.length;i++)
if (tagList.indexOf(obj[i].tagName+";")>=0) {
// Create the LI element
var eLI = document.createElement("LI")
eLI.className="toc" + obj[i].tagName
// Insert the header text
var eLIText = document.createTextNode(getTextForElement(obj[i]))
// Build the tree
eLI.appendChild(eLIText)
el.appendChild(eLI)
}
return el
}
We built the table of contents element by element. First we constructed the UL container element and then for each header in the document we inserted an LI element - within each LI element is a text node containing the actual header text. When we are finished with this function, we have a simple HTML fragment containing the complete table of contents.
To make the table of contents live, we enhance our construction code to insert a link around
each list item. We also insert a destination for the link in the document. Our new
getHeaders() function is as follows:
function getHeaders() {
var obj = document.getElementsByTagName("*")
var el = document.createElement("UL")
var tagList = "H1;H2;H3;H4;H5;H6;"
for (var i=0;i < obj.length;i++)
if (tagList.indexOf(obj[i].tagName+";")>=0) {
// Create the LI element
var eLI = document.createElement("LI")
eLI.className="toc" + obj[i].tagName
// Create the bookmark
var eBookmark = document.createElement("A")
// Set the destination ID
eBookmark.id = "destHeader" +i
// Create the link
var eALink = document.createElement("A")
eALink.href = "#" + eBookmark.id
// Build the tree
var eLIText = document.createTextNode(getTextForElement(obj[i]))
obj[i].appendChild(eBookmark)
eALink.appendChild(eLIText)
eLI.appendChild(eALink)
el.appendChild(eLI)
}
return el
}
That's all there is to making the table of contents live.
To demonstrate how the table of contents works, let's examine a simple HTML fragment:
<H1>Header 1</H1>
...text...
<H2>Header 2</H2>
...text...
The first iteration of the getHeaders() function creates the following HTML fragment:
<UL>
<LI CLASS="toc1">
<A HREF="#destHeader1">
Header 1
</A>
</LI>
</UL>
At the time the link to the destHeader1 is created, the HTML in the document is also updated
with a bookmark destination:
<H1>
<A ID="destHeader1"></A>
Header 1
</H1>
...text...
<H2>Header 2</H2>
...text...
We are using the ID attribute to represent the destination rather than the name attribute.
In HTML 4.0, the ID attribute is the recommended approach for specifying bookmarks within a document.
We chose to use the HTML 4.0 approach since both Internet Explorer and Netscape Gecko support this feature.
On the next iteration, the table of contents fragment is completed and another bookmark is inserted
into the document:
<UL>
<LI CLASS="toc1">
<A HREF="#destHeader1">
Header 1
</A>
</LI>
<LI CLASS="toc2">
<A HREF="#destHeader2">
Header 2
</A>
</LI>
</UL>
and the HTML document is updated to:
<H1>
<A ID="destHeader1"></A>
Header 1
</H1>
...text...
<H2>
<A ID="destHeader2"></A>
Header 2
</H2>
...text...
The last step is to insert the table of contents fragment into the document. The actual insertion
is done with the value returned by the getHeaders() function (see the complete script):
<UL>
<LI CLASS="toc1">
<A HREF="#destHeader1">
Header 1
</A>
</LI>
<LI CLASS="toc2">
<A HREF="#destHeader2">
Header 2
</A>
</LI>
</UL>
<H1>
<A ID="destHeader1"></A>
Header 1
</H1>
...text...
<H2>
<A ID="destHeader2"></A>
Header 2
</H2>
...text...
You now have a complete working table of contents. One advantage of using the W3C document object model is you can build complete fragments of HTML prior to inserting them into the document. In this example, the table of contents is created in memory element-by-element. Only when the table of contents is complete do we insert it into the document.
function getTextForElement(obj) {
var str=""
for (var i=0;i < obj.childNodes.length;i++) {
if (obj.childNodes[i].nodeType==1)
str+=getTextForElement(obj.childNodes[i])
else if (obj.childNodes[i].nodeType==3)
str = obj.childNodes[i].data
}
return str
}
function getHeaders() {
var obj = document.getElementsByTagName("*")
var el = document.createElement("UL")
var tagList = "H1;H2;H3;H4;H5;H6;"
for (var i=0;i < obj.length;i++)
if (tagList.indexOf(obj[i].tagName+";")>=0) {
var eLI = document.createElement("LI")
var eBookmark = document.createElement("A")
eBookmark.id = "destHeader" +i
var eALink = document.createElement("A")
eALink.href = "#" + eBookmark.id
var eLIText = document.createTextNode(getTextForElement(obj[i]))
obj[i].appendChild(eBookmark)
eLI.className="toc" + obj[i].tagName
eALink.appendChild(eLIText)
eLI.appendChild(eALink)
el.appendChild(eLI)
}
return el
}
function ie_getElementsByTagName(str) {
if (str=="*")
return document.all
else
return document.all.tags(str)
}
if (document.all)
document.getElementsByTagName = ie_getElementsByTagName
function doLoad() {
var el = getHeaders()
var startEl = document.getElementsByTagName("BODY")[0]
startEl.insertBefore(el,startEl.childNodes[0])
}
window.onload =doLoad