Programming Tutorials Browser Tutorials Articles Struts Tutorials Hibernate Tutorials

  Tutorial: Access the world's biggest database with Web DataBase Connectivity - JavaWorld March 2001

Access the world's biggest database with Web DataBase Connectivity - JavaWorld March 2001

Tutorial Details:

Access the world's biggest database with Web DataBase Connectivity
Access the world's biggest database with Web DataBase Connectivity
By: By Tony Loton
Use SQL to query Web-based information
hat's your hobby? Mine is amateur investing and the Web helps me a lot. I can log on to one of my favorite financial information sites and look up the latest share prices, or I can log on to a different site to see the news items for the stocks I hold. I have a secret recipe for choosing shares, which I can't tell you, but suppose it's something like: "I will buy shares in a company whose price drops to its 52-week low and has appeared in the news this week." For the first part of that formula, how do I find all companies at their 52-week low? In the past, I viewed the information for each company individually and did a mental calculation. Now I simply type something like this:
SELECT ... AS name, ... AS price, ... AS yearLow
FROM http://www.asharesite.com/lookup
WITHPOST symbol=ALL
WHERE price=yearLow
Besides my hobby, I also have a business -- the Java business. Therefore, I like to stay informed of the latest Java news and views, which I can do at this site, which might be familiar to you (Figure 1):
Figure 1. The JavaWorld homepage. Click on thumbnail to view full-size image (69 KB).
You will see from the picture that today I'm interested in Enterprise Java Beans, so I've typed "ejb" into the search box. The JavaWorld search engine does its job and rewards me with Figure 2:
Figure 2. Search Results for "ejb." Click on thumbnail to view full-size image (68 KB).
The results appear in descending date order and -- although you can't see it in the picture -- the result set contains entries for prior months as well as the current month. Now I want to see the results in ascending date order rather than descending date order but with results prior to December omitted. If I had some software capable of running it, I could express exactly what I want to see with this piece of SQL:
SELECT ... AS linkText, ... AS description, ... AS date
FROM http://search.itworld.com:8765/query.html
WITHGET qt=ejb
WHERE date MATCHES '*Dec*'
ORDER BY date
Figure 3. SQL results. Click on
thumbnail to view full size image (13 KB).
Look at how I've placed the Webpage URL in the FROM clause, in the space where a table name would usually go. You will also notice a new keyword, WITHGET , which I've introduced to trigger a form submission. The result of running this SQL statement would be as shown in Figure 3. I could export the data to another application, a RDBMS or a spreadsheet program. If I issued the query from within a Java program -- using something like JDBC -- I could present the data in my own style, manipulate the data, and combine it with results from other sites. Imagine the possibilities.
That's enough for setting the scene. You should have a fairly good idea of what I'm proposing and why I think it's such a great idea, so now I'll tell you a bit more about this new variation on the SQL theme.
From HTML to SQL
Figure 4. Hierarchy of HTML elements. Click on thumbnail to view full-size image (29 KB).
As designed, SQL works with regular two-dimensional tables with data arranged neatly into rows and columns. Web-based information -- in the form of HTML -- just isn't like that, as you can see from the hierarchy of elements comprising the JavaWorld search page (Figure 4).
To move from the HTML hierarchy to the regular SQL table paradigm, you must think of SQL in a slightly different way. As a simple example, look at how I've transformed the following SQL statement that runs against an employee table with two columns:
> SELECT name, salary FROM employee WHERE name='Tony'
> SELECT row[*].column['name'], row[*].column['salary'] FROM employee WHERE row[*].column['name']='Tony'
> SELECT row[*].column[1] AS name, row[*].column[2] AS salary FROM employee WHERE name='Tony'
The final statement means exactly the same, semantically, as the first. It reads, "Select the first column (call it name ) of any row and the second column (call it salary ) of any row in the employees table where the name is equal to 'Tony.'"
Using this syntax I could propose a new query language called 3D-SQL that operates on tables with the traditional two dimensions plus an additional third dimension, called history. Each table cell is now addressable as row[x].column[y].history[z], which allows me to construct statements like:
> SELECT row[*].column[1].history[1] AS name, row[*].column[2].history[*] AS salary FROM employee WHERE salary>10000
The statement means, "Select the first historic element of any row's first column (call it name ) and any historic element of any row's second column (call it salary ) where the salary is greater than 10,000." In plain English it means, "Give me the names of those employees who have ever had a salary greater than 10,000."
Although I've placed the wildcard characters within row specifications ( row[*] ), I could reverse the sense and use column[*] to do something rather more difficult with traditional SQL: search across the table columns rather than down the table rows.
Now when you look at Figure 4, you might see it in a new light. It's not quite so frightening now, I hope, and I can return to my earlier query against this page and fill in the blanks (note the placement of the wildcard characters) to give:
SELECT .table[0001].tr[0000].td[0002].table[0002].tr[0000].td[0001].table[0000]
.tr[0001].td[0000].table[*].tr[0000].td[0000].htxt[0000] AS linkText,
.table[0001].tr[0000].td[0002].table[0002].tr[0000].td[0001].table[0000]
.tr[0001].td[0000].table[*].tr[0000].td[0000].text[0001] AS description,
.table[0001].tr[0000].td[0002].table[0002].tr[0000].td[0001].table[0000]
.tr[0001].td[0000].table[*].tr[0000].td[0001].table[0000].tr[0001].td[0000]
.text[0000] AS date
FROM http://search.itworld.com:8765/query.html
WITHGET qt=ejb
WHERE date MATCHES '*Dec*'
ORDER BY date
The proposed 3D-SQL name will have to go because we're now working in more than three dimensions, and since the lowest common denominator for these queries will be HTML, I'll now settle on the name HTMSQL (HyperText Markup Structured Query Language). I suspect that this SQL has a lot in common with the versions of SQL for object databases, but that's not my specialty.
Of course, coming up with these kinds of queries proves difficult if all you've got to work from is the Webpage as presented in a browser. Nor are we all fluent in raw HTML, so I propose some new tools.
SQL tools for the Web
For starters, we need an engine to run the SQL statements. I started to implement such an engine and became increasingly excited about the idea as I did more work and ran more test queries. I started with a simple engine that allowed queries against a single-source Webpage. At the time of this writing, the engine has been extended to allow selections from multiple Webpages -- as though from separate database tables -- with the results being joined together.
Devising the queries to put through the engine was very hard work, so I came up with the idea of an Assistant that would present a source HTML page in the hierarchical form shown above in Figure 4. Not only would this show how sets of elements repeat (to help you decide where to put the wildcard characters), it would also allow an SQL statement to be constructed with a few mouse clicks.
The peculiarities of HTML would be handled by highlighting certain elements such as links, forms, and frames, and by allowing you to drill down into frames and follow links to find the data you desire. Form submissions would be automated in order to support dynamic -- and static -- pages as data sources.
I have produced a demonstration version of the Assistant -- as a Java 2 applet -- that lets you run some example SQL queries on live Web data using the SQL Engine. See the link in Resources to try it out.
This kind of assistant would be a useful end-user tool, allowing data to be extracted from the Web and exported to a spreadsheet or database application for further processing. But I'd really like to capture -- as would you, I suspect -- the results into a Java program, probably a servlet that could be used as a portal. To accomplish that task, we need a Java API.
The WDBC API
My title for the Java API -- Web DataBase Connectivity (WDBC) -- suggests an API similar to JDBC. Although I might do so in the future, I've not followed the JDBC API to the letter because we require only a small subset of the JDBC functionality for this task. Indeed, support for updates is not relevant, nor is support for stored procedures. However, I do think that the programmatic feel of the API should be similar to JDBC so, if you know what JDBC code looks like, I'm sure you'll spot the similarities in this WDBC code:
int sqlColumns=3;
String sqlStatement="SELECT .table[0001]...htxt[0000] AS linkText, "
+".table[0001]...table[*].tr[0000].td[0000].text[0001] AS description, "
+".table[0001]...table[*].tr[0000].td[0000].href[0000] AS link\n"
+"FROM http://search.itworld.com:8765/query.html\n"
+"WITHGET col=qt=ejb\n"
+"WHERE linkText MATCHES '*December*'";
SqlEngine sqlEngine=new SqlEngine();
try
{
ResultSet resultSet=sqlEngine.executeQuery(sqlStatement);
while (resultSet.next())
{
for (int c=1; c<=sqlColumns; c++)
{
System.out.print(resultSet.getString(c)+"|");
}
System.out.println();
}
}
catch (Exception ex) { ex.printStackTrace(); }
Queries are submitted via an executeQuery() method and the results come back as a ResultSet , the elements of which you can step through one by one using next() . Having extracted the data into your Java program, you can now do whatever you like with it.
Uses for WDBC
I see three main uses for the HTMSQL and the WDBC Java API:
As an end-user tool to extract selected information from any Website for loading into a spreadsheet or relational database for further analysis and repor


 

Read Tutorial at: Click here to view the tutorial

Rate Tutorial:
Access the world's biggest database with Web DataBase Connectivity - JavaWorld March 2001

View Tutorial:
Access the world's biggest database with Web DataBase Connectivity - JavaWorld March 2001

Related Tutorials:

Integrating Databases
Integrating Databases
 
Connect the enterprise with the JCA, Part 1
Connect the enterprise with the JCA, Part 1
 
Cache SOAP services on the client side
Cache SOAP services on the client side
 
Create your own type 3 JDBC driver, Part 2
Create your own type 3 JDBC driver, Part 2
 
SQL Database Access with DBTags
SQL Database Access with DBTags In a J2SE or J2EE application, Java Database Connectivity (JDBC) can be used to create a connection with a SQL database, create database tables, retrieve result sets, and update the database. To use a database from a Jav
 
J2EE application performance optimization
J2EE application performance optimization How to extract maximum performance from your J2EE Web applications In this article, Rahul Kuchhal demonstrates how to identify and resolve bottlenecks in a J2EE application. This article covers all the step
 
Framework for Java Database Connectivity
What is Framework for Java Database Connectivity? The Framework for Java Database Connectivity (JDBC) was implemented to demonstrate the ease with which a JavaTM application may be designed to access a source code repository using a relational query lang
 
Using CachedRowSet to Transfer JDBC Query Results Between Classes
Using CachedRowSet to Transfer JDBC Query Results Between Classes The Java Database Connectivity (JDBC) API provides developers with an interface to a SQL database server, such as MySQL or Oracle. Central to any JDBC application is the java.sql.ResultS
 
Application Integration: Sun Java System Access Manager 2004Q2 and JDBC Authentication Module
Today, most user authentication solutions for Web applications are ad hoc and are based on proprietary schema definitions in relational databases. So, if you have multiple Web applications, you may have a separate user database for each of your applicatio
 
First Step towards JDBC
This article introduce you with JDBC and shows you how to create a database application to access the databases.
 
Accessing the Database from Servlet
This article shows you how to access database from servlets. Here I am assuming that you are using win95/98/2000 and running Java Web Server.
 
What is Persistence Framework?
What is Persistence Framework? What is Persistence Framework? A persistence framework moves the program data in its most natural form (in memory objects) to and from a permanent data store the database. The persistence framework manages the
 
Free JSP, Free EJB and Free Servlets Hosting Servers
Free JSP, Free EJB and Free Servlets Hosting Servers Web roseindia.net Free JSP, Free EJB and Free Servlets Hosting Servers MyCGIserver - Free Hosting server provides the following support with the free account: JSP 1.1, Servlets, Apache Cocoon,
 

Free Web Site Hosting Services Below is the listing of the hosting providers providing free web hosting services. These services helps you building your sites even if you have no experience in HTML writing. Zero
 
Developing Distributed application using Enterprise Java Beans, J2EE Architecture, EJB Tutorial, WebLogic Tutorial.
Developing Distributed application using Enterprise Java Beans, J2EE Architecture, EJB Tutorial, WebLogic Tutorial. Distributed Architecture Two-tier application: In the past two-tier applications were used. Two-tier applications are also know as
 
First Step towards JDBC!
First Step towards JDBC! First Step towards JDBC Introduction T his article introduce you with JDBC and shows you how to create a database application to access the databases. For the shake of simplicity, in very first example Access database and
 
First Step towards JDBC!
First Step towards JDBC! Welcome to our JDBC Section First Step towards JDBC This article introduce you with JDBC and shows you how to create a database application to access the databases. Accessing the Database from Servlet This article shows
 
Introduction to the JDBC
Introduction to the JDBC Introduction to the JDBC Introduction T his article introduce you with JDBC and shows you how to our search engine with database. What is JDBC? J ava Database Connectivity or JDBC for short is set of Java API's that
 
Complete Webhosting Guide, Search Web hosts, Find Plans
Complete Webhosting Guide, Search Web hosts, Find Plans The Complete Web Hosting Guide RoseIndia.net is the complete beginner's guide to finding a web hosting company. Introduction to Web Hosting What is Web Hosting? Linux vs. Windows
 
Web Hosting Guide. What is Web Hosting Plan?
Web Hosting Guide. What is Web Hosting Plan? What is Web Hosting Plan? Web hosting plan is the different plans provided by Hosting Companies for hosting your web site. Web hosting plans include the storage limit, bandwidth, access to server
 
Site navigation
 

 

Send your comments, Suggestions or Queries regarding this site at roseindia_net@yahoo.com.

Copyright © 2006. All rights reserved.