Take the sting out of SAX
Tutorial Details:
Take the sting out of SAX
Take the sting out of SAX
By: By Leon Messerschmidt
Generate SAX parsers with XML Schemas
Simple API for XML (SAX) parser offers an invaluable tool for parsing XML files, especially if you need to parse large XML input files that cannot load into main memory. A SAX parser can also prove helpful if you have a slow input stream, like an Internet connection, and you need to process bytes as soon as they arrive, instead of waiting for the complete input. As a bonus, a well-designed SAX parser is generally faster than the approach of processing a DOM (Document Object Model) tree; you need only one pass over the XML data as opposed to the two passes needed with a DOM tree (one to build the tree, and one to do the processing).
Unfortunately, a SAX parser can be difficult to develop because of its event-driven nature. In this article, I create a source code generator that will help you easily develop a SAX parser.
Note: I don't explain SAX in detail here; see Resources below for some excellent references.
SAX reviewed
SAX is a standard API that parses an XML input stream, like a file or network connection, and triggers events in an event-handler class. Many different SAX parser implementations are available for Java. In my examples here, I use Xerces from the Apache XML Project, one of the most popular parser implementations.
Listings 1 and 2 below show an XML file and a SAX event handler, respectively. (You can download all source code and examples for this article from Resources .)
Listing 1. Example XML
John
Dole
1-50
123456
Jane
Dole
1-51
123457
Listing 2. SAX handler
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes attributes) throws SAXException
{
text.reset();
if (qName.equals ("company"))
{
String name = attributes.getValue("name");
String header = "Employee Listing For "+name;
System.out.println (header);
System.out.println ();
}
}
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws SAXException
{
if (qName.equals ("first"))
{
firstName = getText();
}
if (qName.equals ("last"))
{
lastName = getText();
}
if (qName.equals ("office"))
{
office = getText();
}
if (qName.equals ("telephone"))
{
telephone = getText ();
}
if (qName.equals ("employee"))
{
System.out.println (office + "\t " + firstName + "\t" +
lastName + "\t" + telephone);
}
}
The SAX handler above merely prints the XML file's data to the standard output device. It prints a header line containing the company name followed by tab-delimited employee data.
As you can see from Listing 2, parsing even a simple XML file can produce a significant amount of source code. SAX's event-driven (as opposed to document-driven) nature also makes the source code difficult to maintain and debug because you must be constantly aware of the parser's state when writing SAX code. Writing a SAX parser for complex document definitions can prove even more demanding; see Resources for challenging real-life examples.
We must reduce the work involved in writing an event-handler structure so we have more time to work on actual processing.
XML Schemas
To lighten our workload, we can automate most of the process of writing the event-handler structure. Luckily, the computer already knows the format of the XML file we will parse; the format is defined in a computer-readable DTD (document type definition) or in an XML Schema. I explore ways to use this knowledge for generating source code that removes the sting from SAX parser development. For this article, I rely on XML Schemas only. Though younger than DTDs, the XML Schema standard will probably replace DTDs in the future. You can easily convert your existing DTD files to XML Schemas with the help of some simple tools .
The first step towards building our code generator is to load the information contained in the XML Schema into a memory model. For this article, I use a simple memory model that defines only the XML entity and attribute names, as well as the entities' relationship to each other. This custom model eases the code generation process. My simplified memory model consists of two classes: Element and Elements . The former stores information for an entity, and the latter manages a list of entities.
Next, we need a mechanism that populates the memory model from an XML Schema. Because an XML Schema is also an XML file, you can use a SAX parser to parse an XML Schema and populate the memory model. In this case, a SAX parser does offer a good choice: you need to only handle events for the entity parts and attribute definitions you're interested in, and ignore extra information by letting the unneeded SAX events pass without handling them. See Resources for the XML Schema parser's full source code.
Once we load the XML Schema information into memory, we can start generating source code for our new SAX parser.
Source code templates
To generate the SAX parser's source code, I use a text-based template engine, which lets me easily insert the memory model's information into source code templates. My favorite text-based template engine is Velocity from Apache's Jakarta project.
You can easily change my source code templates to suit your needs; doing so requires a text editor for editing the templates and only a basic knowledge of Velocity's syntax.
My SAX parser source code templates generate a separate event handler, or Java class, for each complex XML entity. I define a complex entity as one that might contain other XML entities. Methods inside the complex entities' event handlers handle simple entities ?that is, those entities that contain only text content and/or attributes. Because of the multiple class separation, you can more easily find the right place to insert custom source code. The separate event handlers also make code easier to maintain, should any bugs occur later.
The first source code template is for the class that handles events for complex XML entities. It creates methods for each child entity as well as temporary storage for XML attributes:
Listing 3. Event handler template
package ${package};
// JDK Classes
import java.util.*;
import java.io.*;
// Xerces Classes
import org.xml.sax.*;
import org.apache.xerces.parsers.*;
import org.xml.sax.helpers.DefaultHandler;
public class ${element.Name}handler extends DefaultHandler
{
private CharArrayWriter text = new CharArrayWriter ();
private Stack path;
private Map params;
private DefaultHandler parent;
private SAXParser parser;
public ${element.Name}handler(Stack path, Map params, Attributes attributes, SAXParser parser, DefaultHandler parent) throws SAXException
{
this.path = path;
this.params = params;
this.parent = parent;
this.parser = parser;
start(attributes);
}
## Some code omitted
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws SAXException
{
if (qName.equals("${element.Name}"))
{
end();
path.pop();
parser.setContentHandler (parent);
}
#foreach ($child in $element.Children)
#if ($child.hasChildren())
#else
if (qName.equals("${child.Name}")) end${child.Name} ();
#end
#end
}
The second class template is the entry point for the SAX parser and is responsible for initialization tasks and for calling the root element's handler:
Listing 4. The parser template
## Some code omitted
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes attributes) throws SAXException
{
if (qName.equals("${elements.RootElement.Name}"))
{
DefaultHandler handler = new ${elements.RootElement.Name}handler(path,params,attributes,parser,this);
path.push ("${elements.RootElement.Name}");
parser.setContentHandler (handler);
}
}
## Some code omitted
The controller class
Now we simply put everything together in a controller class (download the class's source code from Resources ). A controller class handles the process's logic?see the MVC (Model-View-Controller) model.
Called Generator , the controller class requires two command-line parameters. The first parameter indicates the XML Schema to use, and the second gives the output classes' package name. Generator then loads the XML Schema into memory and executes the source code templates.
With the Generator class, you can easily create a SAX parser. To illustrate how to use the SAX generator, let's create a SAX parser for Listing 1's XML. I include that listing's XML Schema ( example1.xsd ) in Resources as well as the SAX generator's source and binary versions. Before you use the SAX generator's prepackaged binary version, read the readme.txt file for usage directions and required external jar libraries. Also, make sure you correctly set your $JAVA_HOME environment variable. Now you can use generate.bat (for Windows machines) or generate.sh (for Unix/Linux machines) to start the SAX generator. To create a SAX parser for example1.xsd , execute one of the following on the command line:
For Windows:
generate examples\example-1.xsd com.mycompany.package
For Unix/Linux:
./generate.sh examples/example-1.xsd com.mycompany.package
The first parameter indicates the XML Schema the program should use to build the SAX parser; the second parameter indicates the Java package name for the new classes.
This process gives you a set of new classes that form the basis of a new SAX parser. They are located in your SAX generator's output/ subdirectory. Assuming you used example1.xsd , you will have classes called CompanyHandler , EmployeesHandler , EmployeeHandler , and NameHandler .
Use the generated SAX parse
Read
Tutorial at: Click here to view the tutorial
Rate Tutorial: Take the sting out of SAX
View Tutorial: Take the sting out of SAX
Related
Tutorials:
Programming XML in Java, Part 1 - JavaWorld March 2000
Programming XML in Java, Part 1 - JavaWorld March 2000 |
Adelard, one year later - JavaWorld
Adelard, one year later - JavaWorld |
Programming XML in Java, Part 3 - JavaWorld July
2000
Programming XML in Java, Part 3 - JavaWorld July
2000 |
Easy Java/XML integration with
JDOM, Part 1 - JavaWorld May 2000
Easy Java/XML integration with
JDOM, Part 1 - JavaWorld May 2000 |
Easy Java/XML integration with
JDOM, Part 2 - JavaWorld July
2000
Easy Java/XML integration with
JDOM, Part 2 - JavaWorld July
2000 |
XML document
processing in Java using XPath and XSLT - JavaWorld September 2000
XML document
processing in Java using XPath and XSLT - JavaWorld September 2000 |
Mapping XML to Java, Part 1 - JavaWorld August 2000
Mapping XML to Java, Part 1 - JavaWorld August 2000 |
Mapping XML to Java, Part 2 - JavaWorld October
2000
Mapping XML to Java, Part 2 - JavaWorld October
2000 |
Validation with Java and XML Schema, Part 2 - JavaWorld October 2000
Validation with Java and XML Schema, Part 2 - JavaWorld October 2000 |
Jato: The new kid on the open source block - JavaWorld March 2001
Jato: The new kid on the open source block - JavaWorld March 2001 |
The magic of Merlin - JavaWorld March 2001
The magic of Merlin - JavaWorld March 2001 |
XML APIs for databases - JavaWorld January 2000
XML APIs for databases - JavaWorld January 2000 |
XML documents on
the run, Part 1
XML documents on
the run, Part 1 |
Take the sting out of SAX
Take the sting out of SAX |
Create a quick-and-dirty XML parser
Create a quick-and-dirty XML parser |
Java's secret weapon
Java's secret weapon |
XML glossary
XML glossary |
Eclipse 3.0 is out
Eclipse 3.0 is out
Welcome to eclipse.org
Eclipse is a kind of universal tool platform - an open extensible IDE for anything and nothing in particular. |
Getting Groovy with XML
XML sucks. Oh, wait, XML rocks. Well, it actually does a lot of both. It rocks because of all of the editors, validators, and tools written for it. XML has all but replaced any notion of a new custom text-based data language. But it also sucks because it\ |
Parsing and Processing Large XML Documents with Digester Rules
Parsing and Processing Large XML Documents with Digester Rules
XML is commonly used for integration with third-party applications or web services, especially those that are running on non-Java platforms. On the other hand, if the code is running in a man |
|
|
|