Programming Tutorials Browser Tutorials Articles Struts Tutorials Hibernate Tutorials

  Tutorial: JavaWorld article

JavaWorld article

Tutorial Details:

XML documents on the run, Part 3
XML documents on the run, Part 3
By: By Dennis M. Sosnoski
How do SAX2 parsers perform compared to new XMLPull parsers?
n Parts 1 and 2 of this three-part series, I explained both push- (Simple API for XML 2 (SAX2)) and pull-style XML parsers. The pull-side story continues to change rapidly, so, as promised, I'll update you on the latest developments. These include the new Common API for XML Pull Parsing, or XMLPull, announced earlier this month. (Talk about hot off the presses!)
Read the whole "XML Documents on the Run" series:
Part 1: SAX speeds through XML documents with parse-event streams
Part 2: Better SAX2 handling and the pull-parser alternative
Part 3: How do SAX2 parsers perform compared to new XMLPull parsers?
But that's not all: In Part 2 I left loyal readers hanging on performance differences. Pull parsers offer some big ease-of-use advantages compared to SAX2, but can they measure up to SAX2's industrial-strength performance? You'll find out in this article's second half in which I show performance tests pitting five top SAX2 parsers against two new XMLPull parsers.
XMLPull
Just this month the ringleaders from the two leading pull-parser implementations announced XMLPull. Stefan Haustein from the kXML project and Aleksander Slominski from XPP3 (XML Pull Parser), both feeling that the lack of a common API hindered wider pull parsing adoption, began work on XMLPull in December 2001. The resulting API reflects their substantial experience, drawing from their respective projects to produce an approach that works well for a wide range of applications.
XMLPull supports everything from J2ME (Java 2 Platform, Micro Edition) to J2EE (Java 2 Platform, Enterprise Edition). The J2ME requirement forced them to create a simple interface with the minimal number of classes necessary to function well in limited-memory environments. In contrast, although in J2EE situations, memory isn't usually an issue, flexibility and performance are key. Accommodating both extremes with a single interface is tough. Does XMLPull succeed? I tackle that question below. Let's start by looking at the basic interface.
The all-in-one approach
The XMLPull API consists of a single interface, org.xmlpull.v1.XmlPullParser , along with two supporting classes: org.xmlpull.v1.XmlPullParserException and org.xmlpull.v1.XmlPullParserFactory . The XmlPullParser defines XMLPull's interesting parts, so let's examine the interface and ignore the two support classes.
Think of the XmlPullParser interface as defining a special kind of iterator. That iterator delivers an XML document's components to you one at a time. It's up to you, in your program, to decide when you're done with the current component and ready to move to the next one.
The parser always holds a particular state that matches the current component type. Many of XmlPullParser 's methods prove meaningful only when the parser is in a particular state, identified by a set of constant definitions in the interface. When you begin parsing a document, the parser always resides in the START_DOCUMENT state.
How do you determine the parser's state once you begin parsing? Two ways: As the value returned by a call to the interface's next() or nextToken() methods, which advances the parser to the next document component. Or as the value returned by getEventType() , which just gives you the current state.
Cleared for access
XMLPull offers two access levels to the document data, letting you choose the detail level your program wants to see. When you call the next() method, the parser ignores a document's minor details and only reports the meatier components: elements and text. The next() method limits the values to four:
START_TAG for an element's start tag
TEXT for character data content
END_TAG for an element's end tag
END_DOCUMENT for when you've reached the end of the document data
In contrast, the nextToken() method provides more detailed access to the document structure, including components such as processing instructions, comments, entity references, and more. In fact, the nextToken() method gives a "full disclosure" document view; where next() silently skips components it doesn't report, nextToken() reports everything.
Why support full disclosure in a parser API? Reporting everything present in the input stream allows you to layer functionality. For example, neither current XMLPull implementation supports document validation, but the nextToken() parse view of the document offers enough detail that validation could sit as a wrapper layer on top of the basic parsers. Using that approach, only one validation code implementation adds validation support for all XMLPull implementations.
Layering represents a powerful feature. The original SAX interface did not report all the information needed for document validation, so parser writers had to build validation into the parser if they wanted to support it at all. That led to duplicated effort to implement validation within different parsers. Even now many SAX2 parsers do not support validation. In contrast, XMLPull's design avoids the problem completely.
Basic component handling
Most XML applications need only the five basic document components the next() method reports. Of the five, only START_TAG and TEXT warrant a closer look, as START_DOCUMENT , END_TAG , and END_DOCUMENT are self explanatory.
START_TAG provides information from an element's start tag, including the element's attributes. The XmlPullParser interface defines three methods for accessing the element name information: getName() for the local name, along with getNamespace() and getPrefix() for namespace information. The interface also defines six methods for accessing attribute values: getAttributeValue(namespace, name) to retrieve an attribute value by name, along with getAttributeCount() , getAttributeName(index) , getAttributeNamespace(index) , getAttributePrefix(index) , and getAttributeValue(index) for direct indexed access to attributes.
TEXT supplies character-data content information. You can access the character data in two ways: First, the getText() method can get just the text as a string and avoid any details. Second, the getTextCharacters(holder) method can access the raw characters (as with the characters(ch, start, length) handler call in the SAX2 interface). The latter method requires some explanation: it directly returns an array that holds the characters, but the starting position in the array and the length of the character data are returned as values in the int[2] array passed as a call parameter?the start position at [0] and the number of characters at [1] .
That's all you need to know for most XMLPull uses. You'll find much more in the API, including access to the internal namespace stack, document text position, and element nesting depth, but you can dig into these details directly in the Javadocs if you're interested.
Convert from XPP2
In Part 2 , I included code for processing a financial-trade history document using the XPP2 pull-parser interface. Let's look at the changes required to bring that code up to XMLPull compatibility.
Fortunately, you'll need to substantially change only the PullWrapper class, since it has most of the parser-dependent code. Here's the new version:
public class PullWrapper
{
/** Parser in use. */
protected XmlPullParser m_parser;
/** Constructor. Builds the shared objects used for parsing. */
public PullHandler() throws XmlPullParserException {
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
m_parser = factory.newPullParser();
}
/** Parse start of element from document. */
protected void parseStartTag(String tag)
throws IOException, XmlPullParserException {
while (true) {
switch (m_parser.next()) {
case XmlPullParser.START_TAG:
if (m_parser.getName().equals(tag)) {
return;
}
// Fall through for error handling.
case XmlPullParser.END_TAG:
case XmlPullParser.END_DOCUMENT:
throw new XmlPullParserException
("Missing expected start tag " + tag);
}
}
}
/** Parse end of element from document. */
protected String parseEndTag(String tag)
throws IOException, XmlPullParserException {
String text = null;
while (true) {
switch (m_parser.next()) {
case XmlPullParser.TEXT:
text = m_parser.getText().trim());
break;
case XmlPullParser.END_TAG:
if (m_parser.getName().equals(tag)) {
return text;
}
// Fall through for error handling.
case XmlPullParser.START_TAG:
case XmlPullParser.END_DOCUMENT:
throw new XmlPullParserException
("Missing expected end tag " + tag);
}
}
}
/** Parse element, returning content with white space trimmed. */
protected String parseElementContent(String tag)
throws IOException, XmlPullParserException {
parseStartTag(tag);
return parseEndTag(tag);
}
/** Get attribute value from current start tag. */
protected String attributeValue(String name)
throws IOException, XmlPullParserException {
String value = m_parser.getAttributeValue(null, name);
if (value == null) {
throw new XmlPullParserException("Missing attribute " + name);
} else {
return value;
}
}
}
Not much has changed, except that the XPP2 interface used separate objects ( XmlStartTag and XmlEndTag ) to report information about a start or end tag, while the XMLPull common API makes the information directly available from the parser.
The only other necessary change: Remove the call to the parser's reset() method from the TradePullHandler class. When that's done, everything works as expected, and the example program can now use any XMLPull implementation (currently XPP3 and kXML, but more will be coming soon).
The once and future standard
A new Java Community Process (JCP) specification request specifies a standard API for Java pull parsers. As of yet, I can't say what will happen because the project, JSR-173: Streaming API for XML, has just started, but the results will prove important for the long term.
You don't, however


 

Read Tutorial at: Click here to view the tutorial

Rate Tutorial:
JavaWorld article

View Tutorial:
JavaWorld article

Related Tutorials:

JavaWorld Developer Tools Table
JavaWorld Developer Tools Table
 
The battle of the container frameworks: which should you use? - JavaWorld - January 1999
The battle of the container frameworks: which should you use? - JavaWorld - January 1999
 
JavaWorld Editors' Choice Awards
JavaWorld Editors' Choice Awards
 
The Volano Report: Which Java platform is fastest, most scalable? A JavaWorld exclusive! - JavaWorld - Mar
The Volano Report: Which Java platform is fastest, most scalable? A JavaWorld exclusive! - JavaWorld - March 1999
 
How to drag and drop with Java 2 - JavaWorld - March 1999
How to drag and drop with Java 2 - JavaWorld - March 1999
 
The state of Java middleware, Part II: Enterprise JavaBeans - JavaWorld - April 1999
The state of Java middleware, Part II: Enterprise JavaBeans - JavaWorld - April 1999
 
Java 2 introduces print capability to the Swing Forum - JavaWorld June 1999
Java 2 introduces print capability to the Swing Forum - JavaWorld June 1999
 
JavaWorld - Java Tips index
JavaWorld - Java Tips index
 
JavaWorld Developer Tools Guide: IDE
JavaWorld Developer Tools Guide: IDE
 
JavaWorld Developer Tools Guide: Testing Tools
JavaWorld Developer Tools Guide: Testing Tools
 
JavaWorld Developer Tools Guide: Compiler, Code Management
JavaWorld Developer Tools Guide: Compiler, Code Management
 
JavaWorld Developer Tools Guide
JavaWorld Developer Tools Guide
 
JavaWorld Developer Tools Guide: Virtual Machine
JavaWorld Developer Tools Guide: Virtual Machine
 
JavaWorld - Net News Central
JavaWorld - Net News Central
 
XML for the absolute beginner - JavaWorld - April 1999
XML for the absolute beginner - JavaWorld - April 1999
 
Programming XML in Java, Part 1 - JavaWorld March 2000
Programming XML in Java, Part 1 - JavaWorld March 2000
 
JNDI overview, Part 2: An introduction to directory services - JavaWorld February 2000
JNDI overview, Part 2: An introduction to directory services - JavaWorld February 2000
 
JNDI overview, Part 4: the Doc-u-Matic, a JNDI application - JavaWorld March 2000
JNDI overview, Part 4: the Doc-u-Matic, a JNDI application - JavaWorld March 2000
 
Secure type-safe collections - JavaWorld April 2001
Secure type-safe collections - JavaWorld April 2001
 
JavaServer Faces, redux
JavaServer Faces, redux
 
Site navigation
 

 

Send your comments, Suggestions or Queries regarding this site at roseindia_net@yahoo.com.

Copyright © 2006. All rights reserved.