Programming Tutorials Browser Tutorials Articles Struts Tutorials Hibernate Tutorials

  Tutorial: Plant your data in a ternary search tree - JavaWorld February 2001

Plant your data in a ternary search tree - JavaWorld February 2001

Tutorial Details:

Plant your data in a ternary search tree
Plant your data in a ternary search tree
By: By Wally Flint
Create an English dictionary that checks spelling and matches words as you type
he ternary search tree (TST) is the champion of data structure acrobatics -- it finds all keys having a given prefix, suffix, or infix. It even finds those keys that closely match a given pattern. You can easily search the tree for partial matches -- auto matches autograph , automatic , and so on. In addition, you can implement near-match functions, which gives you the ability to suggest alternatives for misspelled words -- e.g., impliment matches implement .
A TST stores key-value pairs, where keys are strings and values are objects. TST keys are stored and retrieved in sorted order, regardless of the order in which they are inserted into the tree. In addition, TSTs use memory efficiently to store large quantities of data. Best of all, all of this magic is performed lightning fast. The tremendous flexibility of TSTs provides ample opportunity for creative programming.
This article demonstrates three creative applications of the TST:
An English dictionary that matches words as you type and checks spelling
A flexible array that can assume any size or dimension on the fly
A database that stores all information in the same place (regardless of which record or column the information belongs to), thereby decreasing access time and reducing storage requirements
Let's begin with a demonstration of the dictionary application.
The dictionary applet
The applet below simulates an English dictionary that uses a TST to store dictionary data. To create this applet, I started with a list of roughly 48,000 words that I found on the Internet. (I can't promise that the list is complete or that all of the words are spelled accurately.) For each word, I created a definition equal to the string "Definition of the word " + currentWord + ".", and stored these key-value pairs (word, definition) in the TST.
Figure 1. An English dictionary built on a ternary search tree
TST structure
From an external perspective, the TST behaves like a HashMap or a HashTable . That is, objects stored in the tree are indexed using string keys. Internally, the TST stores each key character in a separate node.
Figure 2. A ternary search tree
Click on thumbnail to view
full-size image. (19 KB)
Figure 2 illustrates two types of objects: objects that represent tree nodes (no color) and value objects (yellow or gray) encapsulated by the nodes. Arrows represent references -- if a bidirectional arrow connects nodes "s" and "o," then "s" has a reference to "o," and "o" has a reference to "s." Each TST node encapsulates one character from a key (its splitchar ), one value object, a reference to its parent node, and a reference to each of three child nodes -- lo kid, equal kid, and hi kid. A lo kid is below and to the left of its parent, an equal kid is directly below its parent, and a hi kid is below and to the right of its parent. The highest node in the tree is the root node; its parent is null.
The TST in Figure 2 stores five key-value pairs. The keys are so, tab, to, too, and tot. The string too, for example, points to an instance of an object labeled too data in the diagram.
TST algorithms
Let's turn to the manipulation of data stored in the TST. You need a means for using a string key to retrieve data from the tree, using a string key to insert data into the tree, retrieving any data whose key matches a given prefix, and for retrieving any data whose key closely matches a given string.
The following algorithm retrieves the object indexed by a string key.
Object get(String key) {
TSTNode currentNode = rootNode;
int charIndex = 0;
while(true) {
if(currentNode == null) return null;
if (key.charAt(charIndex) == currentNode.splitchar) {
charIndex++;
if(charIndex == key.length()) return currentNode.data;
currentNode = currentNode.relatives[TSTNode.EQKID];
} else if(key.charAt(charIndex) < currentNode.splitchar) {
currentNode = currentNode.relatives[TSTNode.LOKID];
} else {
currentNode = currentNode.relatives[TSTNode.HIKID];
}
}
}
Within a given iteration, this algorithm focuses on a certain node within the tree -- the currentNode -- and on a certain character within the key -- though not explicitly named above, let's call it currentChar .
If currentChar comes before the currentNode 's splitchar in the alphabet, the algorithm shifts its focus to the currentNode 's lo kid, does not change the value of currentChar , and repeats the process. If currentChar comes after the currentNode 's splitchar , the algorithm shifts to the currentNode 's hi kid, does not change currentChar 's value, and repeats the process. If currentChar is equal to the currentNode 's splitchar , the algorithm shifts to the currentNode 's equal kid, sets the value of currentChar to the next char in the key, and repeats the process. When the currentChar 's value changes, the algorithm checks to see if the end of the key has been reached; if it has, it returns the data object -- the value -- stored in the currentNode .
Refer back to Figure 2 as I walk you through the process of retrieving the data indexed by tab. First, set currentChar to the first letter in the key -- tab -- and set currentNode to the root node -- the "t" node, or highest node in Figure 2. Compare currentChar with the rootNode 's splitchar ; since both are equal to "t," shift to the "o" node and set currentChar to the next character in tab . Since "a" comes before "o" in the alphabet, move to the lo kid -- the "a" node.
Since both currentChar and splitchar are now "a," move to the equal kid -- the "b" node -- and set currentChar to the next character in tab . Now, both currentChar and splitchar equal "b." You've reached the end of the key, so the data object stored in the "b" node is the data you're after.
Consider the problem of storing the key-value pair: tab , myObject . To store the pair, you can use the same algorithm with one slight modification -- once it finds the tab node, instead of returning the data object, the algorithm stores myObject in the node (overwriting any data already stored in that node).
Suppose you wish to store a key-value pair and the tree lacks some or all of the nodes required to represent the key. Again, you use nearly the same algorithm. Upon entering a new iteration of the while loop, if the currentNode is null, simply create the missing node and add it to the tree. The new node's position should be the position in which you expected to find it.
Say you wish to store tub , myObject in Figure 2's tree. Set currentChar to the key's ( tub ) first character, and set currentNode to the root node. Compare currentChar to the currentNode 's splitchar ; since both equal "t," set currentChar to the next character in tub and set currentNode to the equal kid -- the "o" node. Compare currentChar to "o." Since "u" comes after "o" in the alphabet, set currentNode to the hi kid, leaving currentChar unchanged. Since currentNode is null, create a new node, setting its splitchar to currentChar , "u." The new node will become the "o" node's hi kid. Now compare currentChar to the currentNode 's splitchar ; since both equal "u," set currentChar to the next character in the key and currentNode to the equal kid. Because currentNode is null, create a new node, setting its splitchar to currentChar , or "b." The new node will become the "u" node's equal kid. Compare currentChar to the currentNode 's splitchar ; both are equal to "b." You've reached the end of the key, so store myObject in currentNode -- the "b" node. Figure 3 depicts the new tree structure.
Figure 3. Figure 2's TST after
adding tub key. Click on thumb-
nail to view full size image.
(23 KB)
The example described above highlights an important characteristic of the TST data structure: Though tub has three characters, adding its key resulted in the addition of only two new tree nodes. That results because the TST reuses the "t" node (root node in Figure 2) in the representation of the new key, tub . Similarly, if you add a new key tool , only one more node ( splitchar equal to 'l' ) would be required. Because the TST reuses nodes and because it creates nodes only when needed to represent new keys, the TST occupies little memory relative to many other data structures.
You've examined the basic storage and retrieval of data. Now explore how to find data having keys that only partially match a target string.
To find all nodes whose keys have a given prefix, simply find the node indexed by that prefix. Regard that node's equal child as a subtree's root node. Every node in the subtree represents a key that begins with the prefix. Since the tree (or subtree) structure maintains the alphabetical order of its keys, a recursive algorithm that retrieves a list of tree data will order the list according to the keys' alphabetical order.
What does it mean to find all nodes whose keys nearly match a target string? In this case, keys can differ from a target string by a specified number of characters. For example, impliment differs from implement by one character. An algorithm that implements that functionality can form the basis for dictionary spell-checking. The recursive algorithm below finds all keys differing from the target key by "d" characters:
matchAlmost(String key, int i, TSTNode currentNode, int d) {
if((currentNode == null) || (d < 0) || (i == key.length())) return;
// low branch
if(key.charAt(i) < currentNode.splitchar) matchAlmost(key, i, currentNode.relatives[TSTNode.LOKID], d);
//equal branch
int nextD = (key.charAt(i) == currentNode.splitchar) ? d : d - 1;
if((key.length() == i + 1) && (nextD == 0) && (currentNode.data != null)) {
// add the key of the current node to the list of nearly matching keys
}
matchAlmost(key, i + 1, currentNode.relatives[TSTNode.EQKID], nextD);
// hi branch
if(key.charAt(i) > currentNode.sp


 

Read Tutorial at: Click here to view the tutorial

Rate Tutorial:
Plant your data in a ternary search tree - JavaWorld February 2001

View Tutorial:
Plant your data in a ternary search tree - JavaWorld February 2001

Related Tutorials:

Sir, what is your preference?
Sir, what is your preference?
 
Connect the enterprise with the JCA, Part 1
Connect the enterprise with the JCA, Part 1
 
Cut down on logging errors with Jylog
Cut down on logging errors with Jylog
 
Big designs for small devices
Big designs for small devices
 
Very interesting article
Very interesting article
 
a-visual-llk-parser-generator VisualLangLab
a-visual-llk-parser-generator: VisualLangLab A Visual IDE-Style LL(k) Parser Generator that uses an editable tree with icons for tokens and non-terminals to represent the grammar symbols and grammar rules.
 
JGraph - The Java Graph Diagram Component
JGraph - The Java Graph Diagram Component JGraphAddons is a drop-in functional module that adds powerful and configurable layout algorithms to your existing JGraphs. They include hierarchical, circular and tree layouts capable of giving your JGraph app
 
JXMLPad 2.3
JXMLPad 2.3 JXMLPad is a pure Swing java component/framework for editing XML/XHTML document.
 
JavaMatch
What is JavaMatch? JavaMatch is an engine that can search inside a runtime Java data structures, and look for objects that best match the criteria that you specify. JavaMatch is a generic match engine, not targeted at a specific domain. It can be applied
 
HeapAnalyzer
What is HeapAnalyzer? HeapAnalyzer allows the finding of a possible JavaTM heap leak area through its heuristic search engine and analysis of the Java heap dump in Java applications. Java heap areas define objects, arrays, and classes.
 
JXMLPad 3.1 FC
JXMLPad is a pure Swing java component/framework for editing XML/XHTML document.
 
Lucene in Action
Lucene in Action Lucene is a gem in the open-source world--a highly scalable, fast search engine. It delivers performance and is disarmingly easy to use. Lucene in Action is the authoritative guide to Lucene. It describes how to index your data, includin
 
Adding search to your applications
The Lucene search engine is an open source, Jakarta project used to build and search indexes. Lucene can index any text-based information you like and then find it later based on various search criteria.
 
Biological Databases Links
Biological Databases Links Biological Databases Biological Databases are like any other databases. Biological Database contains the sequence data of DNA, RNA etc.. These database are organized for optimal retrieval and analysis. Here are the
 
Definition of Bioinformatics
Definition of Bioinformatics Definition of Bioinformatics About Bioinformatics In February 2001, the human genome was finally deciphered! In other words, scientists have succeeded in reading the chain of more than 3 billion base pairs that
 
What is Persistence Framework?
What is Persistence Framework? What is Persistence Framework? A persistence framework moves the program data in its most natural form (in memory objects) to and from a permanent data store the database. The persistence framework manages the
 
Download Search Engine Code its free and Search engine is developed in Servlets
Download Search Engine Code its free and Search engine is developed in Servlets Download Search Engine Code Installation Instruction Download and unzip the file into your favorite directory. Create a database on your MySQL Server. Run all the
 
Building Search Engine Applications Using Servlets !
Building Search Engine Applications Using Servlets ! Building Search Engine Applications Using Servlets Please visit http://www.webappcabaret.com/javadevelopers/search to see running copy of our search engine. Introduction This tutorial takes
 
Welcome to Free search engine secrets: webmaster's guide to search engine registration!
Welcome to Free search engine secrets: webmaster's guide to search engine registration! Welcome To Webmaster's Guide T his site is dedicated to Web related services. Here you can find tools and suggestions to promote your sites to several search
 
Manual Submission to Search Engines. Hand Submit Website URL Submission to Major Search Engines
Manual Submission to Search Engines. Hand Submit Website URL Submission to Major Search Engines ATTENTION: Website owners! Hand Submissions to major search engines for as low as $10.00 per month Let us introduce you to the services provided by
 
Site navigation
 

 

Send your comments, Suggestions or Queries regarding this site at roseindia_net@yahoo.com.

Copyright © 2006. All rights reserved.