Programming Tutorials Browser Tutorials Articles Struts Tutorials Hibernate Tutorials

  Tutorial: XML for the absolute beginner - JavaWorld - April 1999

XML for the absolute beginner - JavaWorld - April 1999

Tutorial Details:

XML for the absolute beginner
XML for the absolute beginner
By: By Mark Johnson
A guided tour from HTML to processing XML with Java
TML and the World Wide Web are everywhere. As an example of their ubiquity, I'm going to Central America for Easter this year, and if I want to, I'll be able to surf the Web, read my e-mail, and even do online banking from Internet cafés in Antigua Guatemala and Belize City. (I don't intend to, however, since doing so would take time away from a date I have with a palm tree and a rum-filled coconut.)
And yet, despite the omnipresence and popularity of HTML, it is severely limited in what it can do. It's fine for disseminating informal documents, but HTML now is being used to do things it was never designed for. Trying to design heavy-duty, flexible, interoperable data systems from HTML is like trying to build an aircraft carrier with hacksaws and soldering irons: the tools (HTML and HTTP) just aren't up to the job.
The good news is that many of the limitations of HTML have been overcome in XML, the Extensible Markup Language. XML is easily comprehensible to anyone who understands HTML, but it is much more powerful. More than just a markup language, XML is a metalanguage -- a language used to define new markup languages. With XML, you can create a language crafted specifically for your application or domain.
XML will complement, rather than replace, HTML. Whereas HTML is used for formatting and displaying data, XML represents the contextual meaning of the data.
This article will present the history of markup languages and how XML came to be. We'll look at sample data in HTML and move gradually into XML, demonstrating why it provides a superior way to represent data. We'll explore the reasons you might need to invent a custom markup language, and I'll teach you how to do it. We'll cover the basics of XML notation, and how to display XML with two different sorts of style languages. Then, we'll dive into the Document Object Model, a powerful tool for manipulating documents as objects (or manipulating object structures as documents, depending upon how you look at it). We'll go over how to write Java programs that extract information from XML documents, with a pointer to a free program useful for experimenting with these new concepts. Finally, we'll take a look at an Internet company that's basing its core technology strategy on XML and Java.
Is XML for you?
Though this article is written for anyone interested in XML, it has a special relationship to the JavaWorld series on XML JavaBeans. (See Resources for links to related articles.) If you've been reading that series and aren't quite "getting it," this article should clarify how to use XML with beans. If you are getting it, this article serves as the perfect companion piece to the XML JavaBeans series, since it covers topics untouched therein. And, if you're one of the lucky few who still have the XML JavaBeans articles to look forward to, I recommend that you read the present article first as introductory material.
A note about Java
There's so much recent XML activity in the computer world that even an article of this length can only skim the surface. Still, the whole point of this article is to give you the context you need to use XML in your Java program designs. This article also covers how XML operates with existing Web technology, since many Java programmers work in such an environment.
XML opens the Internet and Java programming to portable, nonbrowser functionality. XML frees Internet content from the browser in much the same way Java frees program behavior from the platform. XML makes Internet content available to real applications.
Java is an excellent platform for using XML, and XML is an outstanding data representation for Java applications. I'll point out some of Java's strengths with XML as we go along.
Let's begin with a history lesson.
The origins of markup languages
The HTML we all know and love (well, that we know, anyway) was originally designed by Tim Berners-Lee at CERN ( le Conseil Européen pour la Recherche Nucléaire, or the European Laboratory for Particle Physics) in Geneva to allow physics nerds (and even non-nerds) to communicate with each other. HTML was released in December 1990 within CERN, and became publicly available in the summer of 1991 for the rest of us. CERN and Berners-Lee gave away the specifications for HTML, HTTP, and URLs, in the fine old tradition of Internet share-and-enjoy.
Berners-Lee defined HTML in SGML, the Standard Generalized Markup Language. SGML, like XML, is a metalanguage -- a language used for defining other languages. Each so-defined language is called an application of SGML. HTML is an application of SGML.
SGML emerged from research done primarily at IBM on text document representation in the late '60s. IBM created GML ("General Markup Language"), a predecessor language to SGML, and in 1978 the American National Standards Institute (ANSI) created its first version of SGML. The first standard was released in 1983, with the draft standard released in 1985, and the first standard was published in 1986. Interestingly enough, the first SGML standard was published using an SGML system developed by Anders Berglund at CERN, the organization that, as we have seen, gave us HTML and the Web.
SGML is widely used in large industries and governments such as in large aerospace, automotive, and telecommunications companies. SGML is used as a document standard at the United States Department of Defense and the Internal Revenue Service. (For readers outside of the US, the IRS are the tax guys.)
Albert Einstein said everything should be made as simple as possible, and no simpler. The reason SGML isn't found in more places is that it's extremely sophisticated and complex. And HTML, which you can find everywhere, is very simple; for a lot of applications, it's too simple.
HTML: All form and no substance
HTML is a language designed to "talk about" documents: headings, titles, captions, fonts, and so on. It's heavily document structure- and presentation-oriented.
Admittedly, artists and hackers have been able to work miracles with the relatively dull tool called HTML. But HTML has serious drawbacks that make it a poor fit for designing flexible, powerful, evolutionary information systems. Here a few of the major complaints:
HTML isn't extensible
An extensible markup language would allow application developers to define custom tags for application-specific situations. Unless you're a 600-pound gorilla (and maybe not even then) you can't require all browser manufacturers to implement all the markup tags necessary for your application. So, you're stuck with what the big browser makers, or the W3C (World Wide Web Consortium) will let you have. What we need is a language that allows us to make up our own markup tags without having to call the browser manufacturer.
HTML is very display-centric
HTML is a fine language for display purposes, unless you require a lot of precise formatting or transformation control (in which case it stinks). HTML represents a mixture of document logical structure (titles, paragraphs, and such) with presentation tags (bold, image alignment, and so on). Since almost all of the HTML tags have to do with how to display information in a browser, HTML is useless for other common network applications -- like data replication or application services. We need a way to unify these common functions with display, so the same server used to browse data can also, for example, perform enterprise business functions and interoperate with legacy systems.
HTML isn't usually directly reusable
Creating documents in word-processors and then exporting them as HTML is somewhat automated but still requires, at the very least, some tweaking of the output in order to achieve acceptable results. If the data from which the document was produced change, the entire HTML translation needs to be redone. Web sites that show the current weather around the globe, around the clock, usually handle this automatic reformatting very well. The content and the presentation style of the document are separated, because the system designers understand that their content (the temperatures, forecasts, and so on) changes constantly. What we need is a way to specify data presentation in terms of structure, so that when data are updated, the formatting can be "reapplied" consistently and easily.
HTML only provides one 'view' of data
It's difficult to write HTML that displays the same data in different ways based on user requests. Dynamic HTML is a start, but it requires an enormous amount of scripting and isn't a general solution to this problem. (Dynamic HTML is discussed in more detail below.) What we need is a way to get all the information we may want to browse at once, and look at it in various ways on the client.
HTML has little or no semantic structure
Most Web applications would benefit from an ability to represent data by meaning rather than by layout. For example, it can be very difficult to find what you're looking for on the Internet, because there's no indication of the meaning of the data in HTML files (aside from META tags, which are usually misleading). Type red into a search engine, and you'll get links to Red Skelton, red herring, red snapper, the red scare, Red Letter Day, and probably a page or two of "Books I've Red." HTML has no way to specify what a particular page item means. A more useful markup language would represent information in terms of its meaning. What we need is a language that tells us not how to display information, but rather, what a given block of information is so we know what to do with it.
SGML has none of these weaknesses, but in order to be general, it's hair-tearingly complex (at least in its complete form). The language used to format SGML (its "style language"), called DSSSL (Document Style Semantics and Specification Language), is extremely pow


 

Read Tutorial at: Click here to view the tutorial

Rate Tutorial:
XML for the absolute beginner - JavaWorld - April 1999

View Tutorial:
XML for the absolute beginner - JavaWorld - April 1999

Related Tutorials:

Sir, what is your preference?
Sir, what is your preference?
 
Transparently cache XSL transformations with JAXP
Transparently cache XSL transformations with JAXP
 
The Java Web Services Tutorial
This tutorial is a beginner\'s guide to developing Web services and Web applications using the Java Web Services Developer Pack (Java WSDP).
 
Smartly load your properties
Smartly load your properties
 
Good ideas
Good ideas
 
PrEd
PrEd is a Java based graphical utility to find and edit Java property files in JAR, WAR, EAR and other kind of ZIP archives. It is the perfect tool for customizing Java applications, which use XML and Property files for their configuration.
 
Getting Groovy with XML
XML sucks. Oh, wait, XML rocks. Well, it actually does a lot of both. It rocks because of all of the editors, validators, and tools written for it. XML has all but replaced any notion of a new custom text-based data language. But it also sucks because it\
 
XML Document Validation with an XML Schema
This tutorial explains the procedure of validating an XML document with an XML schema.
 
JTimepiece
JTimepiece is the advanced library for working with dates and times in Java. Many easy-to-use methods in this API make it easy for any developer, from beginner to expert, to use JTimepiece.
 
The JavaTM Web Services Tutorial
A beginner's guide to developing Web services and Web applications on the Java Web Services Developer Pack
 
Generating an XML Document with JAXB
In this tutorial, JAXB is used to generate Java classes from an XML Schema. An example XML document shall be created from the Java classes.
 
Beginner Guide to Linux Server
Beginner Guide to Linux Server Beginner Guide to Linux Server Introduction Linux is know for its security, performance, reliability. Many business units are looking towards Linux as it provides low costs software for them. Earlier Linux used to
 
J2ME Technology Turns 5!
In 2004 the Java 2 Platform, Micro Edition (J2ME) celebrated its fifth anniversary. This article presents where J2ME is today.
 
Understanding MIDP System Threads
Describes the multi-threaded aspects of the J2ME application environment. Understanding the interactions between systems threads, user-interface and application threads will help in avoiding MIDlet deadlock.
 
Parsing an XML Document with XPath
The getter methods in the org.w3c.dom package API are commonly used to parse an XML document. But J2SE 5.0 also provides the javax.xml.xpath package to parse an XML document with the XML Path Language (XPath) .
 
Beginner Guide to Linux Server
Beginner Guide to Linux Server Beginner Guide to Linux Server Introduction Linux is know for its security, performance, reliability. Many business units are looking towards Linux as it provides low costs software for them. Earlier Linux used to
 
Roseindia.net Place to Get Cheap Linux cds in India!!
Roseindia.net Place to Get Cheap Linux cds in India!! Cheapest latest Linux CDs in India. Free Linux: Linux is free and it can be downloaded from the Internet at free of cost. But it requires a lot of time to download and check the ISO image and
 
The Beginners Linux Guide
The Beginners Linux Guide The Beginner's Guide to Linux Dedicated website to Linux containing Online Resource, Links to Linux Resources and Tutorials. Here you will find a lot of tutorials and information about the Linux. What is Linux? This
 
Beginner to advance guide to the Apache Struts
Beginner to advance guide to the Apache Struts The Complete Apache Struts Tutorial This complete reference of Jakarta Struts shows you how to develop Struts applications using ant and deploy on the JBoss Application Server. Ant script is provided
 
Complete Webhosting Guide, Search Web hosts, Find Plans
Complete Webhosting Guide, Search Web hosts, Find Plans The Complete Web Hosting Guide RoseIndia.net is the complete beginner's guide to finding a web hosting company. Introduction to Web Hosting What is Web Hosting? Linux vs. Windows
 
Site navigation
 

 

Send your comments, Suggestions or Queries regarding this site at roseindia_net@yahoo.com.

Copyright © 2006. All rights reserved.