Processing XML with Java


 

Processing XML with Java

In this section we will understand the API's available for processing XML in Java.

In this section we will understand the API's available for processing XML in Java.

Processing XML with Java

XML is cross-platform software, hardware independent language. XML is the acronym for eXtensible Markup Language. Markup language marks-up the text by wrapping tags around the text to define what that text is. Unlike Html, XML markup language is used as data container and not for the purpose to display the data. Unlike Html, it also does not have any predefined tags or attributes but the user itself has to define it as per need. So this is also qualified as "self-descriptive" language.

XML and HTML

Html and XML were developed for different purposes.

1. Html is a widely used markup language for the purpose of displaying data while XML is used as a data collection tool.

2. Html emphasis on how data looks like when displayed while xml is used to structure, store and carry the data marked up with the user defined tags and attributes.

3. XML is a complement to Html and not the replacement.

Sample XML document

Below is sample xml file. You can see how the information is well structured and stored and most importantly, it's all done by yourself.

The first line defines the xml version and encoding used for this document. <books> is the root element and is the parent of all the elements inside it. Every element must have closing tag also.

<?xml version="1.0" encoding="ISO-8859-1"?>
<books>
    <book isbn="BN01">
          <book-name>XML Java</book-name>
          <authors>
                 <author-name>Author1</author-name>
                 <author-name>Author2</author-name>
                 <author-name>Author3</author-name>
          </authors>
          <publisher>Abc</publisher>
    </book>
</books>

Three common terms used in XML

1. Tags

The text between < and > is termed as tag. For example, "book" is one tag in our sample document.

2. Elements

Everything in between opening tag and closing tag including itself is termed as element. An element can contain multiple elements or simple text. For example, "book" is an element containing three child elements <book-name>, <authors> and <publisher> elements. In the same way, <authors> element contains three <author-name> elements.

3. Attributes

Elements in XML can have attributes which is pair of name and value inside the starting tag of the element. For example,  <book isbn="BN01"> element has isbn attribute and BN01 is the value of it.

XML Syntax Rules

As every language has some rules to be followed by, XML also has some simple rules to follow.

1. Root element must exist in XML document

2. XML is case sensitive

3. Elements must be nested properly

4. Every element must have closing tag

5. Values for attributes must be enclosed with quotes.

 XML Validation

If the xml document is a well formed and valid according to the defined rules in DTD or XML schema then the document is said to be valid.

DTD defines the structure of the xml document and also specifies legal elements and attributes for the document. This can be defined inside the xml document and outside also.

XML Schema is a W3C Standard and is xml based alternative to DTDs. It is also referred to as XML Schema Definition (XSD).

XML APIs

There are multiple vendors who provide XML APIs. Some of them are as below:

1. SAX

SAX is simple and lightweight api for handling xml. It is acronym for "Simple API for XML". It provides the mechanism to read information from xml document. It was initially Java only api but now it is supported by many other languages also. SAX parser is an Event Driven Parser. When it reads XML data, it invokes the callback methods you provided. SAX parser is faster than DOM and use less memory.

2. DOM

DOM stands for Document Object Model. It is read write api so can parse xml document and create new. DOM represents the xml as a tree. DOM Parser is slow and consume a lot memory than SAX parser. DOM is language independent and was not designed with Java in mind. So it's tough to implement for programmers who are much experienced with java language.

3. JDOM

JDOM is java based DOM and integrates DOM and SAX. It is tree based java api to work with xml documents. It is built in Java and created for Java. It uses SAX parser to parse the xml document. JDOM can create a new in memory XML tree like DOM and it can traverse any section of the tree at any time.

4. JAXP

0

JAXP stand for "Java API for XML Processing". It is used for processing (parse, validate, transform and query) xml documents in java based applications. JAXP is part of java development kit (JDK) and you don't need to download it separately. JAXP includes DOM and SAX APIs and makes easy to work with them.

5. dom4j

It is a java based open source library to work with XML, XPath and XSLT and fully compatible with DOM, SAX and JAXP.

1

Like JDOM, it is tree-based and read-write API on Java platform for processing XML. It provides full support for Java Collection Framework and parses large xml using less memory.

6. ElectricXML

This is also one of the various java based APIs for xml processing. It is very small in size and is easy to parse the xml document using ElectricXML. It parses DTD but does not validate or implement DOM.

2

Ads