Optimize a query on a Map - JavaWorld November 2000
Tutorial Details:
Optimize a query on a Map
Optimize a query on a Map
By: By Jack Shirazi
Comparing techniques for performance tuning a query on a Map class
n "Optimizing a query on a collection" , I considered how to optimize a query on an indexable collection. However, optimizing queries on Map classes turns out to be more complicated. In this article, I'll provide an example of how you can speed up bottlenecks consisting of Map queries.
The query
First I'll start with the problem. I'll use strings as keys for my collection. For their corresponding values, I'll simply use integer objects to indicate some value object. For the query, I'll use a simple test that checks whether any particular string key includes one of a specified substrings set, and the query will simply return the summation of the integer values for those string keys, which include the substrings. For example, the Map might be:
key -> value
"code" -> 1
"rode" -> 2
"load" -> 3
"toad" -> 4
"road" -> 5
The query in that case might be "sum of the values for those string keys in the map that contain the substrings od or lo" (the answer would be 1+2+3=6 for this example list).
For my actual keys, I'll generate multicharacter strings by using lowercase letters (a to z). For example, a collection of all four-character strings would generate a collection of 26 x 26 x 26 x 26 = 456976 four-character strings. The values will simply be an integer counter that increments by one as each string is added to the Map . I'll query that Map for the summation of integer values for those strings that contain any of the substrings ie, xy, or pq. I've elected to use a Hashtable object to hold the collection for the start of the tests.
I've chosen to use an easily generated collection for the data and a straightforward query to avoid any application-specific distractions. I want to focus on the tuning here. The query represents many query types I've seen in applications, though a more representative test would return the value collection for the keys that satisfy the query. I've opted for the summation to avoid generating too many collections in my tests.
The simple straightforward version of the query is:
int count = 0;
Enumeration enumeration = map.keys();
String s;
while(enumeration.hasMoreElements())
{
s = (String) enumeration.nextElement();
if( ( s.indexOf("ie") != -1 )
|| ( s.indexOf("xy") != -1 )
|| ( s.indexOf("pq") != -1 ) )
{
count += ((Integer) this.get(s)).intValue();
}
}
return count;
For the tests, I'll use the Sun virtual machines (VMs) from the Java SDK 1.2 and 1.3. In addition, I'll test with the HotSpot VMs delivered with those two SDKs -- HotSpot 1.0 and 2.0. I'll also test one non-JIT VM, which in this case will be the 1.2 VM with the JIT turned off. You can easily turn off the JIT by setting the java.compiler property to NONE :
java "-Djava.compiler=NONE" ...
Avoid synchronization
As I said earlier, I started by using a Hashtable object to hold the map. In most applications, bottleneck queries tend to be read-only or single-threaded. In either case, you can normally use a nonsynchronized object to hold the collection. To do so here requires a simple change to using a HashMap object instead of the Hashtable object I initially used. In addition, I need to change the code to use an iterator, as HashMap does not support an enumerator. In fact, I really shouldn't have used the enumerator in the original query since the Map interface does not support its use.
The query now reads as the following:
int count = 0;
Iterator iterator = map.keySet().iterator();
String s;
while(iterator.hasNext())
{
s = (String) iterator.next();
if( ( s.indexOf("ie") != -1 )
|| ( s.indexOf("xy") != -1 )
|| ( s.indexOf("pq") != -1 ) )
{
count += ((Integer) this.get(s)).intValue();
}
}
return count;
However, the test (test2 in Table 1 below) does not change the query time very much. A couple of the VMs even register slower times. That looks slightly odd to me and, looking back at what I've done, I noticed that I've been a little naughty. I made two changes to the code and tested those changes together without determining whether either change individually speeds up the query time. I changed the class and changed the enumerator to an iterator. I should really check each change separately.
My first concern is that the iterator is accessed via a Set of Map elements. I would have a real performance problem if the Set were a whole new collection with all the elements copied from Map . But the Set implementations of the JDK Map classes are actually alternate views of the underlying collection, and so they should not cause any adverse performance impact for this query.
Table 1: Percentage of Speed Increase in Query Time for Each Optimization Test, Relative to the 1.2 JIT VM Test
Test
1.2 JIT VM
HotSpot 1.0 2nd run *
1.3 JIT VM
HotSpot 2.0 2nd run *
1.2 non-JIT
Test details
test1
100%
129%
134%
125%
985%
Hashtable with enumerator
test2
98.1%
129%
135%
124%
1104%
HashMap with iterator
test3
108%
131%
138%
125%
1187%
Hashtable with iterator
test4
91.5%
125%
124%
118%
1001%
replaced .hasNext() call in test2
test5
120%
131%
134%
126%
1009%
use Map.Entry Set view
test6
83.0%
107%
102%
99.1%
727%
query in Map class
test7
78.1%
103%
105%
102%
720%
specialized Map class of String keys and int values
test8
77.7%
73.3%
104%
69.7%
697%
specialized Map class of char[] keys and int values
test9
19.4%
18.7%
22%
18.5%
111%
specialized Map class of four- char keys and int values
test10
8.7%
12.0%
11.4%
10%
82.3%
perfect minimum hashing on four- char keys and int values
NOTE: HotSpot VMs run methods in interpreted mode until the HotSpot VM profiler decides that the method is better run natively compiled. The method is then natively compiled with extensive optimizations applied to it. For the test runs here, some methods were optimized by the second test run, but not until the third test run were the HotSpot optimizations fully applied to the test by the VM. Consequently, I show the results of the first and third test runs in the HotSpot VMs.
The process of testing the two changes individually is straightforward: simply use a Hashtable with the iterator. The test result (test3 in Table 1) is very illuminating. Test3 clearly shows that enumerators are faster than iterators for the Map s I've reviewed here, and that is probably true for other JDK classes. The slower iterators combined with the faster nonsynchronized HashMap class balanced each other out to give test2's mixed results. Unfortunately, we don't have the option of using the faster nonsynchronized HashMap together with the faster enumerators.
Eliminate the repeated Iterator.hasNext()
Let's move on with performance tuning. The next obvious optimization is to eliminate the repeated Iterator.hasNext() call in the for loop test. You already know how many elements there are, so you don't need to ask if there are more.
This is a simple change. You can replace:
while(iterator.hasNext())
with:
for (int size = map.size(); size > 0; size--)
The test results are in test4 of Table 1. I've stuck with the HashMap and the iterator for test4, so test2 is the previous directly comparable test. The results show that the optimization is unequivocally better for all VMs. But my overall tuning effort so far shows only a small improvement.
Avoid the repeated Map.get() call
As with the last optimization, you can eliminate another method call, the Map.get() call. That call is used to get the value for the keys that satisfy your query.
You can eliminate the Map.get() call because Map classes support a set view that contains key-value pairs, obtained from the Map.entrySet() method. The elements of that set are java.util.Map.Entry objects, so you can access the value objects directly after you query the keys. The query code now looks like this:
int count = 0;
Iterator iterator = this.entrySet().iterator();
String s;
Map.Entry entry;
for (int size = this.size(); size > 0; size--)
{
entry = (Map.Entry) iterator.next();
s = (String) entry.getKey();
if( ( s.indexOf("ie") != -1 )
|| ( s.indexOf("xy") != -1 )
|| ( s.indexOf("pq") != -1 ) )
{
count += ((Integer) entry.getValue()).intValue();
}
}
return count;
Unfortunately, the test results (test5 in Table 1) show that change to be unambiguously bad for performance. Although the idea of replacing the Map.get() call with a quicker data member access is nominally good, you unfortunately have to replace it with two method accesses and an extra cast that swamp any advantages you may have gained. That optimization should be immediately discarded.
Avoid the method accessors
A standard optimization for queries on collections is to reimplement the query in the collection class so that you can avoid having to repeatedly access elements through accessor methods. Unfortunately, JDK Map classes that define their internal elements as protected do not exist, so you cannot just subclass an existing Map and add the query. Instead, you need to create your own Map class. But that is not a large problem since many examples of hash map classes with source code are available (from the JDK and the Web).
To define the query, you need to know how a hash map holds its elements internally. I'll use a standard implementation similar to HashMap from the JDK for my class. This element storage algorithm is not too difficult:
Obtain an index from the key. The easiest way to do this is to use the key object's hashcode and modulo it by the size of the internal collection.
Use the index to insert the element into the internal collection. Since the index could be the same for multiple objects, that location in the internal table is actually a node of a linked list, and the key and corresponding value is added as the last node of the list.
So the internal structure is an array of linked list nodes, with each node holdi
Read
Tutorial at: Click here to view the tutorial
Rate Tutorial: Optimize a query on a Map - JavaWorld November 2000
View Tutorial: Optimize a query on a Map - JavaWorld November 2000
Related
Tutorials:
Connect the
enterprise with the JCA, Part 1
Connect the
enterprise with the JCA, Part 1 |
Navigate data with the Mapper framework
Navigate data with the Mapper framework |
Business process
automation
made easy with
Java, Part 2
Business process
automation
made easy with
Java, Part 2 |
J2ME devices:
Real-world performance
J2ME devices:
Real-world performance |
Isolate server includes' runtime context
Isolate server includes' runtime context |
Smart Value Object goes
one step further
Smart Value Object goes one step further
The Smart Value Object allows server components to track client-side modification of business objects in a rich client/J2EE server environment, by using the latest features offered by bytecode processing tools.
|
LDAPXML: An LDAP to XML Converter
LDAPXML: An LDAP to XML Converter
LDAPXML is a set of java classes that allow you to access LDAP entries as custom defined XML. It allows you to map the various LDAP objectClasses and attributes to XML namespaces, attributes, elements etc. If you're lo |
Simple classes for JDBC
Simple classes for JDBC |
Framework for Java Database Connectivity
What is Framework for Java Database Connectivity?
The Framework for Java Database Connectivity (JDBC) was implemented to demonstrate the ease with which a JavaTM application may be designed to access a source code repository using a relational query lang |
Using CachedRowSet to Transfer JDBC Query Results Between Classes
Using CachedRowSet to Transfer JDBC Query Results Between Classes
The Java Database Connectivity (JDBC) API provides developers with an interface to a SQL database server, such as MySQL or Oracle. Central to any JDBC application is the java.sql.ResultS |
JavaMatch
What is JavaMatch?
JavaMatch is an engine that can search inside a runtime Java data structures, and look for objects that best match the criteria that you specify. JavaMatch is a generic match engine, not targeted at a specific domain. It can be applied |
Simple Object Persistence with the db4o Object Database
Simple Object Persistence with the db4o Object Database. db4o has been chosen for applications in embedded systems in which zero administration, reliability, and low footprint are critical features. In Germany, BMW Car IT, for example, uses it in an embed |
JTimepiece
JTimepiece is the advanced library for working with dates and times in Java. Many easy-to-use methods in this API make it easy for any developer, from beginner to expert, to use JTimepiece. |
Pool resources using Apache's Commons Pool Framework
Resource pooling is not new and is being widely used to conserve and optimize the usage of resources like threads, sockets, and database connections. Web server implementations routinely use thread pool implementations for performance and scalability reas |
Biological Databases Links
Biological Databases Links
Biological Databases
Biological Databases are like any other databases. Biological Database contains the sequence data of DNA, RNA etc.. These database are organized for optimal retrieval and analysis.
Here are the |
istory of Bioinformatics
istory of Bioinformatics
History of Bioinformatics
The Modern bioinformatics is can be classified into two broad categories, Bi ological Science and computational Science . Here is the data of hi storical events for both biology and computer |
What is Persistence Framework?
What is Persistence Framework?
What is Persistence Framework?
A persistence framework moves the program data in its most natural form (in memory objects) to and from a permanent data store the database. The persistence framework manages the |
Adding slash "\" character before quote "'" in a query
Adding slash "\" character before quote "'" in a query
Adding slash " \ " character before quote " ' " in a query
During the inserting the records in the database if user enters the phrases like "What ' s your name?", database gives the error due |
Welcome! to the News letter archive
Welcome! to the News letter archive
News Letter Archive
Date
Subject
Dec-29-2001
Adding slash " \ " character before quote " ' " in a query |
New Technical Articles: 64-bit Programming on Solaris 10 OS for x86 Platforms
Four technical articles describe the new Sun Studio 10 software's 64-bit programming features on the Solaris 10 OS for x86 and AMD64 platforms. Important issues regarding the AMD64 ABI (Application Binary Interface), debugging, migration to 64-bits, and p |
|
|
|