XML JavaBeans, Part 1 - JavaWorld February 1999
Tutorial Details:
XML JavaBeans, Part 1
XML JavaBeans, Part 1
By: By Mark Johnson
Make JavaBeans mobile and interoperable with XML
TML (Hypertext Markup Language) currently is the document format of the World Wide Web. Lately, though, there's been a lot of noise about XML (Extensible Markup Language), which allows, among other things, the ability to define new markup tags (the bits between ), or even whole new markup languages. Some pundits even claim that XML may supplant HTML as the dominant information format on the Web.
Read the whole "XML JavaBeans" series:
Part 1. Make JavaBeans mobile and interoperable with XML
Part 2. Automatically convert JavaBeans to XML documents
Part 3. Integrate the XMLBeans package with the Java core
For some, XML seems one of those ideas that, while exciting at first, isn't entirely useable in practice. How would a developer use XML in a real life system? What good is the ability to define custom tags if no browsers understand them? In this month's column, we'll look at a possible application of XML -- namely, using it as a serialization format for JavaBeans.
First, you'll read a quick rundown of what XML is and why so many people are so excited about it. Next, you'll hear about the World Wide Web Consortium's (W3C's) Document Object Model, the proposed standard for representing documents as data structures. As an example of processing a document as a data structure, we'll describe a very small custom markup language, and then implement a class that reads an XML file and transforms it into a JavaBean.
Please note that the primary purpose of this article is to provide an example of XML in use. While it is not an introduction to XML for the complete novice, this article should be comprehensible with just a bit of preparatory reading (see the introductory articles listed in the Resources section.)
What's wrong with HTML, anyway?
There's a great deal of introductory material on the Web about XML, so we're going to go over XML basics pretty quickly. Let's start by discussing why XML is necessary in the first place.
It's easy to make the argument that HTML enabled the explosion of the Web. Among the many strengths that have made HTML the dominant format for Web documents are the following:
HTML is very easy to learn and use. Practically anyone with a pulse can learn to write HTML. Reading HTML in a Web browser is so simple and intuitive that just about everyone grasps it instantly.
Logical layout makes HTML documents portable. HTML markup describes to a browser what roles various pieces of text play in a document (title, list element, and so on,) and the browser is free to decide how (or if) to display them. This provides a great deal of device independence.
Hypertext forms webs of knowledge. One of the most useful features of HTML for many applications is the ability to make information "come alive" and refer to other information.
HTML forms a framework for composite documents. The addition of applets and other sorts of "active" page elements provides immense creative control to developers on the Web "platform."
Despite these and the many other strengths that make HTML so useful and, well, cool, it has some serious drawbacks that are rapidly becoming obstacles to using it in serious data applications:
HTML is a rapidly growing monster. It was originally designed for sharing documents between scientists at CERN. (CERN stands for Conseil Europeen pour la Recherche Nucleaire, the Center for European Nuclear Research, though its Web site consistently describes it as the European Laboratory for Particle Physics.) They wanted structured text with some simple outline capabilities, simple hyperlinks, primitive font control, and maybe some pretty pictures and colors, and that's what they created. It was simple, elegant, and useful. It's still useful, but simple and elegant have gone out the window as developers have demanded, and browser creators have developed, new features for HTML. The HTML specification has ballooned to enormous size with the addition of such features as scripting, frames, layers, tables, forms, style sheets, objects, applets, and on and on.
HTML is set in stone. Within a particular version of the HTML standard, only certain tags, such as or (for boldface ), are recognizable HTML tags. If you're working in HTML, you're stuck with the tags recognized by the HTML spec (or your particular browser). If you want to define your own tags for some reason, you're out of luck.
HTML is very browser-centric. HTML documents are, by and large, plain text with markup to provide display organization, some font control, and graphic content. They are documents written for humans to read, not for client-side programs to analyze and present. Because of this, HTML is not a good choice as an information format for automated data processing systems.
HTML mostly addresses presentation, not content. Generally, HTML tags describe how or in what context to display a particular piece of text. The semantics of the text, that is, what that text actually means, is lost in HTML.
What do the data mean?
This last deficiency of HTML is the clincher. As data become more mobile in data processing systems, it's necessary to transfer both the information and meta-information about what the data mean. A number in an HTML table may or may not reliably mean something when the document is read by a program. An XML document can be designed to express not simply how to display the data, but what data mean.
For example, an HTML table can display statistics for an individual baseball player, as in Figure 1.
HTML Source
NO. |
PLAYER |
High School |
AB |
R |
... (and so on) ...
Resulting table
NO.
PLAYER
High School
AB
R
H
HR
RBI
AVG
12
Jonas Grumby
Eaton
69
31
30
2
15
.435
Figure 1. Batting averages in an HTML
A row-column representation of these data is fine if what's needed is simply a static display of data in this particular format, but it's not a great representation if you want to associate meaning with the data in your application. Try writing a program that reads the HTML above, retrieves the information about, say, the hitter's runs-batted-in, and then does something with that quantity. With HTML, that's not easy to do in a general way. Imagine, though, that your data file looked something like what appears in Figure 2:
JonasGrumby
12
Eaton
1997
69
31
30
2
15
Figure 2. The batting information specified in XML
Figure 2 is a sample of XML that represents the same information as in Figure 1. It would be easy to pick out the "runs-batted-in" statistic in this document. The document could change structure radically, and the tag would still be relatively easy to find. The XML code in Figure 2 contains the same information as the HTML code in Figure 1, but it's represented in a way that indicates what the data mean, not just how to present the data.
Just as in HTML, a style sheet can be associated with XML, though XML's style language, XSL, is more powerful and cryptic than HTML's Cascading Style Sheets. In fact, XSL can convert XML into HTML for display by a browser! The XML above could be displayed in a browser just as it appears in Figure 1, but client-side programs could also collect and use such statistics, since there's an indication (via the tag) of what the data mean.
You may be wondering how I knew what tags to use in creating my sample XML file. Where did the tag names (like RunsBattedIn ) come from? The answer is: I made them up. I just invented markup tags for my application out of thin air! Creating a new markup language is just like creating any other kind of custom file format. A developer simply creates a file format that meets the needs of the application. XML files are special in that they conform to the XML definition, and so programs that process them can expect input of a certain structure, and can reasonably reject inputs that don't follow that structure.
In the example above, I've created a new XML sublanguage simply by inventing new tags and using them consistently. XML also provides the option of specifying a Document Type Definition (DTD), which is a specification of what elements form a valid document. A DTD gives a developer much more control over the format of an XML document with a DTD than without one. We're not going to cover DTDs in this article, but they are a core XML concept.
If you think XML looks like HTML, it's because they're close cousins. Both XML and HTML are applications of SGML (Standard Generalized Markup Language), which is a metalanguage -- that is, a language for describing languages. SGML is an extremely powerful, flexible, and complex tool, and its complexity has led to its use primarily in huge organizations, like governments and large corporations. XML is a subset of SGML that retains most of SGML's power while simplifying it for use by common mortals. In fact, both HTML and XML are actually specified as DTDs in SGML. (Are you burned out on acronyms yet?)
Referring again to Figure 2, notice that the XML indicates what the data mean, not how they are to be displayed. Notice also that the tags certainly are not standard HTML. (Let's hope that the tag is never made part of the HTML standard!) This example shows one of the strengths of XML: the ability to define custom markup tags to suit a particular application. Finally, notice that the batting average doesn't appear in the XML. That's because the average could be calculated from the other values.
One of XML's most power
Read
Tutorial at: Click here to view the tutorial
Rate Tutorial: XML JavaBeans, Part 1 - JavaWorld February 1999
View Tutorial: XML JavaBeans, Part 1 - JavaWorld February 1999
Related
Tutorials:
Web services hits
the Java scene,
Part 1
Web services hits
the Java scene,
Part 1 |
JSP best practices
Follow these tips for reusable and easily maintainable JavaServer Pages |
Boost Struts with
Boost Struts with XSLT and XML |
XML documents on
the run, Part 2
XML documents on
the run, Part 2 |
Discover and publish Web services with JAXR
Discover and publish Web services with JAXR |
Rumble in the
jungle: J2EE versus .Net, Part
1
Rumble in the
jungle: J2EE versus .Net, Part
1 |
Yes, you can secure your Web services documents, Part 1
Yes, you can secure your Web services documents, Part 1 |
Jini's relevance emerges, Part
2
Jini's relevance emerges, Part
2 |
Business process
automation
made easy with
Java, Part 1
Business process
automation
made easy with
Java, Part 1 |
Yes, you can secure your Web services documents, Part 2
Yes, you can secure your Web services documents, Part 2 |
Sun boosts
Sun boosts enterprise Java |
Get the inside
track on J2EE architect certification
Get the inside
track on J2EE architect certification |
The J2EE 1.4 Tutorial
The J2EE 1.4 Tutorial is a guide to developing enterprise applications for the Java 2 Platform, Enterprise Edition (J2EE) version 1.4. Here we cover all the things you need to know to make the best use of this tutorial. |
JSP 2.0: The New Deal, Part 3
JSP 2.0: The New Deal, Part 3
More Flexible JSP Document Format Rules
The JSP specification supports two types of JSP pages: regular JSP pages containing any type of text or markup, and JSP Documents, which are well-formed XML documents; i.e., docum |
Extensible Code Generation with Java, Part 1
Extensible Code Generation with Java, Part 1
Code generation is a key new trend in engineering, one that you need to understand well. The reason is simple: today's modern frameworks are extremely code-intensive. Using a code generator to build the code |
JXMLPad 2.3
JXMLPad 2.3
JXMLPad is a pure Swing java component/framework for editing XML/XHTML document.
|
Turn EJB components into Web services
Summary
Web services have become the de facto standard for communication among applications. J2EE 1.4 allows stateless Enterprise JavaBeans (EJB) components to be exposed as Web services via a JAX-RPC (Java API for XML Remote Procedure Call) endpoint, al |
JXMLPad 3.1 FC
JXMLPad is a pure Swing java component/framework for editing XML/XHTML document. |
Developing Distributed application using Enterprise Java Beans, J2EE Architecture, EJB Tutorial, WebLogic Tutorial.
Developing Distributed application using Enterprise Java Beans, J2EE Architecture, EJB Tutorial, WebLogic Tutorial.
Distributed Architecture
Two-tier application:
In the past two-tier applications were used. Two-tier applications are also know as |
Solaris 10 OS Certification Beta Exams
If you are an expert in system and network administration, you can get involved in the creation of three new Solaris 10 certification exams. These Beta exams count toward official Solaris Certification and allow you to provide comments and technical feedb |
|
|
|
| Site
navigation |
|
|
|