Programming XML in Java, Part 3 - JavaWorld July
2000
Tutorial Details:
Programming XML in Java, Part 3
Programming XML in Java, Part 3
By: By Mark Johnson
DOMination: Take control of structured documents with the Document Object Model
he Simple API for XML (SAX) is an excellent interface for many XML applications. It is intuitive, extremely easy to learn, and, as its name implies, simple. Any Java programmer can, in just an hour or two, learn to use and develop an application using SAX. It is especially useful in situations where the data in an XML file is already in a form that is structurally similar to the desired output. For instance, the recipe example in Part 2 of this series formatted Recipe XML into an HTML representation of a recipe page and a shopping list. The structure of the output HTML was very similar to the structure of the input XML. The ingredients in the Recipe XML were grouped together in an element; the ingredients in the output HTML were grouped together in an unordered list ( ). The tags were somewhat different, but the basic structure was the same.
Programming XML in Java: Read the whole series!
Part 1. Use the Simple API for XML (SAX) to process XML in Java easily
Part 2. Learn about SAX and XML validation through illustrative examples
Part 3. DOMination: Take control of structured documents with the Document Object Model
In real data-processing situations, however, the structure of the input data often differs greatly from the eventual output structure. Since SAX passes SAX events to a programmer-defined handler in the order in which they appear in the input XML, as the programmer you are responsible for any data restructuring or reordering. Also, if the same data is to be used in more than one place in the output, you must either perform multiple passes over the XML or arrange for the handler to "remember" that data while producing output. One example of this was the recipe title in Part 2, which the handler maintained in an internal variable for use both in the browser title bar and in the Webpage.
For tasks of low and intermediate complexity, SAX works just fine. As an application's complexity (and functionality) increases, however, the SAX handler code can become extremely difficult to understand. SAX code can spend most of its time storing information from the input in an internal form usable for producing the desired output. When using SAX, you are generally responsible for creating an internal object model of your application's information.
DOM to the rescue
The Document Object Model, or DOM, is a standardized object model for XML documents. DOM is a set of interfaces describing an abstract structure for an XML document. Programs that access document structures through the DOM interface can arbitrarily insert, delete, and rearrange the nodes of an XML document programmatically.
DOM and SAX parsers work in different ways. A SAX parser processes the XML document as it parses the XML input stream, passing SAX events to a programmer-defined handler method. A DOM parser, on the other hand, parses the entire input XML stream and returns a Document object. Document is the programmatic, language-neutral interface that represents a document. The Document returned by the DOM parser has an API that lets you manipulate a (virtual) tree of Node objects; this tree represents the structure of the input XML. Figures 1 and 2 illustrate this difference between the APIs.
Figure 1. The SAX parser calling programmer-defined handler routines
Figure 2. The DOM parser returning a Document object
In Figure 1, you see the SAX parser calling programmer-defined handler routines for each tag in the XML document. In Figure 2, the DOM parser returns a Document object, which represents the hierarchical structure of the tags (and such other informational elements as attributes, text blocks, and so on) in the original XML. When the parse has completed, you use the methods that are in the Document API to access the contents of the XML tree.
One major benefit of the DOM parser is that it provides random access to the structures inside the XML tree. Imagine, for example, that you are writing a genealogy application that could show any individual's relatives from that individual's point of view. The original XML document representing your family would include you as the child of two parents and possibly a parent of one or more children. Now let's say you want to create a program that could print a personal report for any person in the tree. If you were to write that program using SAX, you'd have two tasks. First, you'd probably need to build a representation of your family tree in memory, so you could access any node in the tree and print that node's relatives. Your second task, after the parse was complete, would be to print the genealogy report starting at a specified node in the tree.
A DOM parser would relieve you of the first task, building the family tree, by actually building a tree of objects for you, as shown in Figure 2. You could produce an identical report, but you'd do half as much work (or even less).
The origins of SAX and DOM are different as well. SAX, originally an interface for writing XML parsers, was created by a group of people on the XML-DEV mailing list. DOM was created and is maintained by the members of the W3C (World Wide Web Consortium) DOM working group as a standard API for accessing XML structures. In fact, many DOM parsers use a SAX parser to create the document tree that the parser returns.
It would be incorrect to say that DOM is superior to SAX. DOM provides an information model that is richer and correspondingly more complex than the one provided by SAX. With a SAX parser, the handler object receives a stream of tokens only once. A DOM parser lets you look at any node in the tree as many times as you like, manipulate the tree, write the tree out in different formats, and pass the tree to other pieces of software that understand the DOM interfaces.
So far, I've told you that a DOM document is made up of Node objects, but I haven't told you precisely what a Node object is. Of exactly what kinds of objects is this document tree composed? The answer, it turns out, is that any object can appear in the tree of DOM nodes, as long as that object implements one of the DOM interfaces. I'll look at the types of DOM interfaces in the next section.
Anatomy of a document
Figure 3 below illustrates the inheritance graph of the DOM Level 1 interfaces. (DOM Level 1 is the first, simplest implementation of DOM from the W3C. DOM Levels 2 and 3 are currently under development. See Resources for a link to the official documentation.) As you can see, just about everything in a document tree is a Node . Most DOM interfaces are descended from Node .
Figure 3. The inheritance graph of the DOM Level 1 interfaces
DOM defines a document as a tree of objects that implement the interfaces in the DOM package. All of these objects implement Node , because all of the DOM interfaces are subinterfaces of Node . Element , for example, inherits the methods of Node , as well as additional methods necessary to represent a single tag in a structure document (which is its role).
Note that the DOM package does not consist of classes; rather, it contains only interfaces (with one exception). This is because DOM is a specification of interfaces between pieces of software, not a particular implementation of DOM document Node s. This is powerful partly because the interface specification defines what the program does, and different vendors can provide various implementations for the interfaces. In fact, most DOM parsers include implementation classes that implement all of the interfaces in the package. DOM parsers generally return trees of these implementation classes, but all the application programmer knows about these returned objects is that they implement the appropriate interface.
The Node interface represents the general node in a DOM tree.
For any particular node, the interface has methods for accessing the node's child nodes, its parent node, and the Document node at the top of the tree in which the node lives -- essentially, all of the methods needed to access and manipulate the tree of nodes. Element s, Comment s, Text , and so on are all types of Node s.
Here are the subinterfaces of Node that form the document tree:
Element : The Element interface represents a single tag in an XML document. (There are interfaces for such objects in the DOM for HTML as well, but I'll limit this discussion to XML.) This interface inherits all of Node 's methods; it also adds additional methods for manipulating Element 's attributes and foraccessing all sub- Element s with a particular tagname.
CharacterData : The CharacterData interface represents (what else?) character data. Its subinterfaces are Text , CDATASection , and Comment (see below for descriptions). The CharacterData interface provides methods for adding, deleting, inserting, and otherwise manipulating the text data in the node.
Text : This subinterface of CharacterData is a representation of character data content within an element or attribute. The text inside a Text node contains no markup. Any entity, comment, or other text that contains markup will appear in separate nodes.
CDATASection : CDATASection is a subinterface of Text that can contain markup. The markup within a CDATASection is not interpreted by the XML parser. This makes it easier to create text in the document that contains many characters that might be misinterpreted as markup. A CDATASection in an XML document begins with the markup . So, for example, the following CDATASection :
represents the text:
Markup & Mayhem
It would represent:
Markup & Mayhem
outside the context of the CDATASection .
Attr : Attr nodes contain those variable = value pairs that you see within element tags. In the tag:
the attri
Read
Tutorial at: Click here to view the tutorial
Rate Tutorial: Programming XML in Java, Part 3 - JavaWorld July
2000
View Tutorial: Programming XML in Java, Part 3 - JavaWorld July
2000
Related
Tutorials:
XML messaging, Part
3
XML messaging, Part
3 |
Java security evolution
and concepts, Part 5
Java security evolution
and concepts, Part 5 |
Use XML data binding to do your
laundry
Use XML data binding to do your
laundry |
Boost Struts with
Boost Struts with XSLT and XML |
XML documents on
the run, Part 1
XML documents on
the run, Part 1 |
Create your own type 3 JDBC driver, Part 2
Create your own type 3 JDBC driver, Part 2 |
Rumble in the
jungle: J2EE versus .Net, Part
1
Rumble in the
jungle: J2EE versus .Net, Part
1 |
Jabber away with instant
messaging
Jabber away with instant
messaging |
J2SE 1.4
breathes new life into the CORBA community, Part
3
J2SE 1.4
breathes new life into the CORBA community, Part
3 |
Sun boosts
Sun boosts enterprise Java |
The Java Web Services Tutorial
This tutorial is a beginner\'s guide to developing Web services and Web applications using the Java Web Services Developer Pack (Java WSDP). |
Maybe the future UI design of choice
Maybe the future UI design of choice |
SpeedJG - XML Builder
SpeedJG - XML based Java Swing GUI Builder |
JSP 2.0: The New Deal, Part 3
JSP 2.0: The New Deal, Part 3
More Flexible JSP Document Format Rules
The JSP specification supports two types of JSP pages: regular JSP pages containing any type of text or markup, and JSP Documents, which are well-formed XML documents; i.e., docum |
JDBC scripting, Part 2
JDBC scripting, Part 2
Programming and Java scripting in JudoScript
Summary
JudoScript is a rich functional scripting language, and an easy and powerful general programming and Java scripting language.
JudoScript's power comes from its synergy of |
Hibernate simplifies inheritance mapping.
Learn three easy-to-implement strategies to map class hierarchies. Hibernate is an object-relational mapping and persistence framework that provides a lot of advanced features, ranging from introspection to polymorphism and inheritance mapping. |
Generating an XML Document with JAXB
In this tutorial, JAXB is used to generate Java classes from an XML Schema. An example XML document shall be created from the Java classes. |
Biological Databases Links
Biological Databases Links
Biological Databases
Biological Databases are like any other databases. Biological Database contains the sequence data of DNA, RNA etc.. These database are organized for optimal retrieval and analysis.
Here are the |
developing a Session Bean and a Servlet and deploy the web application on
JBoss 3.0
developing a Session Bean and a Servlet and deploy the web application on JBoss 3.0
Writing Stateless Session Bean and Calling through Servlet
Previous Tutorial Index Next
In this lesson I will show you how to develop a Stateless Session Bean and |
New Technical Articles: 64-bit Programming on Solaris 10 OS for x86 Platforms
Four technical articles describe the new Sun Studio 10 software's 64-bit programming features on the Solaris 10 OS for x86 and AMD64 platforms. Important issues regarding the AMD64 ABI (Application Binary Interface), debugging, migration to 64-bits, and p |
|
|
|