Programming Tutorials Browser Tutorials Articles Struts Tutorials Hibernate Tutorials

  Tutorial: XML document processing in Java using XPath and XSLT - JavaWorld September 2000

XML document processing in Java using XPath and XSLT - JavaWorld September 2000

Tutorial Details:

XML document processing in Java using XPath and XSLT
XML document processing in Java using XPath and XSLT
By: By André Tost
Discover how XPath and XSLT can significantly reduce the complexity of your Java code when handling XML documents
he Extensible Markup Language (XML) is certainly one of the hottest technologies at the moment. While the concept of markup languages is not new, XML seems especially attractive to Java and Internet programmers. The Java API for XML Parsing (JAXP; see Resources ), having recently been defined through the Java Community Process, promises to provide a common interface for accessing XML documents. The W3C has defined the so-called Document Object Model (DOM), which provides a standard interface for working with an XML document in a tree hierarchy, whereas the Simple API for XML (SAX) lets a program parse an XML document sequentially, based on an event handling model. Both of these standards (SAX being a de facto standard) complement the JAXP. Together, these three APIs provide sufficient support for dealing with XML documents in Java, and numerous books on the market describe their use.
This article introduces a way to handle XML documents that goes beyond the standard Java APIs for manipulating XML. We'll see that in many cases XPath and XSLT provide simpler, more elegant ways of solving application problems. In some simple samples, we will compare a pure Java/XML solution with one that utilizes XPath and/or XSLT.
Both XSLT and XPath are part of the Extensible Stylesheet Language (XSL) specification (see Resources ). XSL consists of three parts: the XSL language specification itself, XSL Transformations (XSLT), and XML Path Language (XPath). XSL is a language for transforming XML documents; it includes a definition -- Formatting Objects -- of how XML documents can be formatted for presentation. XSLT specifies a vocabulary for transforming one XML document into another. You can consider XSLT to be XSL minus Formatting Objects. The XPath language addresses specific parts of XML documents and is intended to be used from within an XSLT stylesheet.
For the purposes of this article, it is assumed that you are familiar with the basics of XML and XSLT, as well as the DOM APIs. (For information and tutorials on these topics, see Resources .)
Note: This article's code samples were compiled and tested with the Apache Xerces XML parser and the Apache Xalan XSL processor (see Resources ).
The problem
Many articles and papers that deal with XML state that it is the perfect vehicle to accomplish a good design practice in Web programming: the Model-View-Controller pattern (MVC), or, in simpler terms, the separation of application data from presentation data. If the application data is formatted in XML, it can easily be bound -- typically in a servlet or Java ServerPage -- to, say, HTML templates by using an XSL stylesheet.
But XML can do much more than merely help with model-view separation for an application's frontend. We currently observe more and more widespread use of components (for example, components developed using the EJB standard) that can be used to assemble applications, thus enhancing developer productivity. Component reusability can be improved by formatting the data that components deal with in a standard way. Indeed, we can expect to see more and more published components that use XML to describe their interfaces.
Because XML-formatted data is language-neutral, it becomes usable in cases where the client of a given application service is not known, or when it must not have any dependencies on the server. For example, in B2B environments, it may not be acceptable for two parties to have dependencies on concrete Java object interfaces for their data exchange. New technologies like the Simple Object Access Protocol (SOAP) (see Resources ) address these requirements.
All of these cases have one thing in common: data is stored in XML documents and needs to be manipulated by an application. For example, an application that uses various components from different vendors will most likely have to change the structure of the (XML) data to make it fit the need of the application or adhere to a given standard.
Code written using the Java APIs mentioned above would certainly do this. Moreover, there are more and more tools available with which you can turn an XML document into a JavaBean and vice versa, which makes it easier to handle the data from within a Java program. However, in many cases, the application, or at least a part of it, merely processes one or more XML documents as input and converts them into a different XML format as output. Using stylesheets in those cases is a viable alternative, as we will see later in this article.
Use XPath to locate nodes in an XML document
As stated above, the XPath language is used to locate certain parts of an XML document. As such, it's meant to be used by an XSLT stylesheet, but nothing keeps us from using it in our Java program in order to avoid lengthy iteration over a DOM element hierarchy. Indeed, we can let the XSLT/XPath processor do the work for us. Let's take a look at how this works.
Let us assume that we have an application scenario in which a source XML document is presented to the user (possibly after being processed by a stylesheet). The user makes updates to the data and, to save network bandwidth, sends only the updated records back to the application. The application looks for the XML fragment in the source document that needs to be updated and replaces it with the new data.
We will create a little sample that will help you understand the various options. For this example, we assume that the application deals with address records in an addressbook . A sample addressbook document looks like this:


John Smith
250 18th Ave SE
Rochester
MN
55902


Bill Morris
1234 Center Lane NW
St. Paul
MN
55123


The application (possibly, though not necessarily, a servlet) keeps an instance of the addressbook in memory as a DOM Document object. When the user changes an address, the application's frontend sends it only the updated
element.
The element is used to uniquely identify an address; it serves as the primary key. This would not make a lot of sense for a real application, but we do it here to keep things simple.
We now need to write some Java code that will help us identify the
element in the source tree that needs to be replaced with the updated element. The findAddress() method below shows how that can be accomplished. Please note that, to keep the sample short, we've left out the appropriate error handling.
public Node findAddress(String name, Document source) {
Element root = source.getDocumentElement();
NodeList nl = root.getChildNodes();
// iterate over all address nodes and find the one that has the correct addressee
for (int i=0;iNode n = nl.item(i);
if ((n.getNodeType() == Node.ELEMENT_NODE) &&
(((Element)n).getTagName().equals("address"))) {
// we have an address node, now we need to find the
// 'addressee' child
Node addressee = ((Element)n).getElementsByTagName("addressee").item(0);
// there is the addressee, now get the text node and compare
Node child = addressee.getChildNodes().item(0);
do {
if ((child.getNodeType()==Node.TEXT_NODE) &&
(((Text)child).getData().equals(name))) {
return n;
}
child = child.getNextSibling();
} while (child != null);
}
}
return null;
}
The code above could most likely be optimized, but it is obvious that iterating over the DOM tree can be tedious and error prone. Now let's look at how the target node can be located by using a simple XPath statement. The statement could look like this:
//address[child::addressee[text() = 'Jim Smith']]
We can now rewrite our previous method. This time, we use the XPath statement to find the desired node:
public Node findAddress(String name, Document source) throws Exception {
// need to recreate a few helper objects
XMLParserLiaison xpathSupport = new XMLParserLiaisonDefault();
XPathProcessor xpathParser = new XPathProcessorImpl(xpathSupport);
PrefixResolver prefixResolver = new PrefixResolverDefault(source.getDocumentElement());
// create the XPath and initialize it
XPath xp = new XPath();
String xpString = "//address[child::addressee[text() = '"+name+"']]";
xpathParser.initXPath(xp, xpString, prefixResolver);
// now execute the XPath select statement
XObject list = xp.execute(xpathSupport, source.getDocumentElement(), prefixResolver);
// return the resulting node
return list.nodeset().item(0);
}
The above code may not look a lot better than the previous try, but most of this method's contents could be encapsulated in a helper class. The only part that changes over and over is the actual XPath expression and the target node.
This lets us create an XPathHelper class, which looks like this:
import org.w3c.dom.*;
import org.xml.sax.*;
import org.apache.xalan.xpath.*;
import org.apache.xalan.xpath.xml.*;
public class XPathHelper {
XMLParserLiaison xpathSupport = null;
XPathProcessor xpathParser = null;
PrefixResolver prefixResolver = null;
XPathHelper() {
xpathSupport = new XMLParserLiaisonDefault();
xpathParser = new XPathProcessorImpl(xpathSupport);
}
public NodeList processXPath(String xpath, Node target) thrws SAXException {
prefixResolver = new PrefixResolverDefault(target);
// create the XPath and initialize it
XPath xp = new XPath();
xpathParser.initXPath(xp, xpath, prefixResolver);
// now execute the XPath select statement
XObject list = xp.execute(xpathSupport, target, prefixResolver);
// return the resulting node
return list.nodes


 

Read Tutorial at: Click here to view the tutorial

Rate Tutorial:
XML document processing in Java using XPath and XSLT - JavaWorld September 2000

View Tutorial:
XML document processing in Java using XPath and XSLT - JavaWorld September 2000

Related Tutorials:

XSLT blooms with Java
XSLT blooms with Java
 
Generate JavaBean classes dynamically with XSLT
Generate JavaBean classes dynamically with XSLT
 
Boost Struts with
Boost Struts with XSLT and XML
 
XML documents on the run, Part 1
XML documents on the run, Part 1
 
Publish
Publish event-driven Web content with JSP custom tags
 
Create a quick-and-dirty XML parser
Create a quick-and-dirty XML parser
 
Transform data into Web applications with Cocoon
Transform data into Web applications with Cocoon
 
XML glossary
XML glossary
 
Sun boosts
Sun boosts enterprise Java
 
Transparently cache XSL transformations with JAXP
Transparently cache XSL transformations with JAXP
 
AurigaDoclet: Javadoc doclet for generating javadoc in pdf, postscript, etc
What Is AurigaDoclet? AurigaDoclet is a Javadoc doclet which can generate Java API document in fo, pdf, postscript, pcl, and svg format. AurigaDoclet accepts command line options which can be used to further customize the generated output.
 
NetBeans - XSLT Editor
NetBeans - XSLT Editor
 
FastParser 1.6.3
FastParser 1.6.9.1 XML Edition FastParser is a Java Xml parser High performance XML parser (benchmarks* : up to +100% faster compared to Xerces and JDK1.4 integrated parser) SAX Level 1 and 2 compliant DOM support JAXP compatibility Names
 
Eclipse - XML / XSLT Plugin
Eclipse - XML / XSLT Plugin A plugin for the eclipse IDE adding XML / XSLT editing facilities
 
Extensible Code Generation with Java, Part 1
Extensible Code Generation with Java, Part 1 Code generation is a key new trend in engineering, one that you need to understand well. The reason is simple: today's modern frameworks are extremely code-intensive. Using a code generator to build the code
 
Getting Groovy with XML
XML sucks. Oh, wait, XML rocks. Well, it actually does a lot of both. It rocks because of all of the editors, validators, and tools written for it. XML has all but replaced any notion of a new custom text-based data language. But it also sucks because it\
 
FOP is the world's first print formatter driven by XSL formatting objects.
It is a Java application that reads a formatting object tree and then turns it into a PDF document. The formatting object tree, can be in the form of an XML document (output by an XSLT engine like XT or Xalan) or can be passed in memory as a DOM Document
 
This tutorial shows how to Combine the power of XPath and JSP tag libraries
In this article, we'll examine the XPath custom tag library for JSPs and see a tag collection that provides simple control constructs and a uniform attribute value substitution facility, all of which combine to reduce complexity and improve functionality.
 
Generating an XML Document with JAXB
In this tutorial, JAXB is used to generate Java classes from an XML Schema. An example XML document shall be created from the Java classes.
 
Parsing an XML Document with XPath
The getter methods in the org.w3c.dom package API are commonly used to parse an XML document. But J2SE 5.0 also provides the javax.xml.xpath package to parse an XML document with the XML Path Language (XPath) .
 
Site navigation
 

 

Send your comments, Suggestions or Queries regarding this site at roseindia_net@yahoo.com.

Copyright © 2006. All rights reserved.