Accelerate your Java apps! - JavaWorld - September 1998
Tutorial Details:
Accelerate your Java apps!
Accelerate your Java apps!
By: By Mark Roulo
Where does the time go? Find out with these speed benchmarks
s a Java programmer, knowing the performance characteristics of different Java environments running on different operating systems is crucial. Having this information at hand can prepare you for potential bottlenecks, and it can save you from building bottlenecks into your apps accidentally. This article tests six different Java environments -- some with a just-in-time (JIT) compiler, some without -- running on four OSs and provides valuable benchmarks that can help you out with your Java development efforts.
The testing process
To understand performance characteristics and therefore where to expect bottlenecks, I ran benchmark tests on the following typical Java language constructs: method call, try/catch set-up, object creation, array creation, and array accessing. I didn't run tests of network I/O, disk I/O, or AWT performance -- the focus was purely on Java language performance. The tests were designed to avoid paging to disk.
Most of the tests required no garbage collection, so general system performance cannot be inferred by simply adding the results from the various tests together. I ran no general computation tests like "Tower of Hanoi" or "Sieve of Erastothenes." I omitted general computational tests because they're dedicated to showing relative speeds on differing platforms, and rarely show where bottlenecks are.
Target systems and environments
The benchmark tests I ran for this article were performed on a range of hardware systems and Java environments. The Java environments were:
Java Environments
Description
JIT
Netscape Navigator 4.05 for Windows NT/95
Symantec Java! ByteCode Compiler Version 210.065
Netscape Navigator 4.05 for Power Macintosh
Yes
Internet Explorer 4.0 for Windows NT/95
Yes
Symantec Visual Cafe PDE 2.1a for Windows NT/95 JDK 1.1.4
Symantec Java! ByteCode Compiler Version i300.009
Netscape Navigator 4.05 for SPARC
No
Netscape Navigator 4.05 for Linux
No
The hardware/OS platforms were:
Hardware/OS Systems
OS
CPU (MegaHertz = MHz)
RAM (megabytes = MB)
Windows NT SP3
Pentium Pro 200 MHz
128 MB
Macintosh 7.6.1
PowerPC 604e 180 MHz
Solaris 2.5.1
UltraSPARC-1 167 MHz
128 MB
Red Hat Linux 5.1
Pentium-II 266 MHz
128 MB
Windows NT
Dual Pentium Pro 180 MHz
32 MB
To compare the various systems, I converted the time it took to perform the various operations into clock cycles. Why? This conversion makes it possible to compare machines running CPUs at different speeds. In general, comparing different CPUs to each other in such a crude way can be dangerous, because the amount of work that can be done in a single clock cycle can vary a lot from CPU to CPU. The 80486, for example, averages about 2 clock cycles per instruction, while the Pentium executes closer to 1. Fortunately, the PowerPC 604e, UltraSPARC, Pentium Pro, and Pentium-II are roughly comparable. While cache behavior could be different between the various systems, this seems not to affect the performance much. All the tests ran without paging to disk.
Special resources
For information relating to the benchmark testing, I've provided the following links:
An HTML page with an applet that runs the tests
The source code for the applet
The source code for the C/C++
An Excel 95 spreadsheet with all the data
Method calls
The ability to write and call methods (or functions) is a critical tool for building and maintaining large systems. Methods allow programs to be broken into smaller, more easily handled chunks. However, if method calls slow down a running program, programmers will design systems with bigger parts and fewer method calls.
Object-oriented programming increases the number of method calls when compared to equivalent procedural programs because it encourages more data encapsulation (among other things). Compare these two lines of code and notice the extra method call in the line showing encapsulation:
Without encapsulation:
int x = someObject.x;
With encapsulation:
int x = someObject.getX();
Encapsulation increases the number of method calls in a program, so it is essential that those method calls execute quickly. If method calls don't execute quickly, programmers often attempt to speed up their programs by avoiding encapsulating the data in their programs. Examples of this lack of encapsulation can be seen in some of the standard Java classes. The class java.awt.Dimension , for example, is written with both of its data members public. A better design would have hidden the data members by making them private and providing public accessor methods:
private int height;
private int width;
public int getHeight()
{
return height;
}
public int getWidth()
{
return width;
}
Because the early Java environments shipped without JIT compilers, method calls were much slower than current Java environments. The encapsulation shown above may have been unacceptably slow to run in those early environments, with the result that the data is public.
Fortunately, today's JIT-enabled Java environments perform method calls much faster than earlier non-JIT-enabled environments. There is less of a need to make speed-versus-encapsulation tradeoffs in these environments. With the best JIT, static methods returning nothing and taking no arguments execute in 2 clock cycles. Non-static method calls returning integer quantities execute in 7 clock cycles. Non-static method calls returning floating-point numbers execute in 8 clock cycles.
By making these accessor methods final , you can expect to reduce these times by one clock cycle. When running in a Java environment without a JIT, method calls take anywhere between 280 and 500 clock cycles. A good JIT can speed up method calls by a factor of more than 100 -- so in target environments with a good JIT, you can have both encapsulation and speed. In environments without a JIT or with a poor JIT, programmers must decide on a case-by-case basis whether speed or encapsulation is more important. A good JIT can make this decision unnecessary.
The graph below shows the effect adding parameters has on the time a method call takes under various JIT-enabled Java runtimes. While the time a method call takes to execute varies considerably from one runtime to another, adding parameters to a method call frequently increases the time required to execute the method call. Often, adding one parameter does not increase the time required to execute a method call. Only rarely does adding a parameter speed up a method call. Also note that, regardless of the number of parameters, there is some general overhead in setting up a method call. Once a decision has been made to call a method, adding a few parameters will have little impact on the time it takes to make the call.
Notice that the JIT for Netscape Navigator on the Macintosh runs 25 percent as fast as the JITs on Windows. I have no numbers for JIT-enabled runtimes on Solaris. If you expect to support Macintosh and Windows clients, be sure to do your performance benchmarking on Macintoshes as well as on Windows clients.
Several popular environments do not yet come with a JIT. The graph below shows the effects of adding parameters to the time a method call takes under two non-JIT-enabled Java runtimes. You'll see that the cost of adding parameters is still mostly monotonic but that the general overhead of setting up a function call is very high.
Both Linux and Solaris have Java environments with JITs, but not under Navigator. I did not have access to these environments and have no data for them.
Finally, this graph compares the best Java time with C/C++.
Java seems to be only about 1 clock cycle slower than C++.
Recommendations
If you expect to run with a reasonable JIT, method calls are no more expensive in Java than they are in C or C++. If you expect to run on a system without a JIT or without a very good one, this is something you'll have to pay attention to in the speed-critical portions of your application.
Object creation
Modern microprocessors run at speeds of up to 600 MHz. Unfortunately, modern DRAM runs considerably slower. In burst mode, a modern SDRAM runs at about 100 MHz. If programs accessed memory in a truly random fashion, CPUs would spend most of their time waiting for DRAM. Fortunately, programs don't access memory in a random fashion. If a memory location has been accessed recently, it is quite likely to be accessed again soon. This property is called locality of reference.
Unfortunately, the locality of reference for Java programs can be worse than it is in equivalent C or C++ programs. This is due to object creation. Object creation in a Java program is fundamentally different than that in an equivalent C or C++ program. Many of the small temporary objects that C or C++ would create on the stack are created on the heap in Java.
In C and C++, when these objects are discarded at the end of a method call, the space is available for more temporary objects, and the stack area is almost always in the on-board Level 1 cache.
In Java, the objects are discarded, but the space typically is not reclaimed until the next garbage collection -- which usually doesn't happen until the heap memory is exhausted. The space for the next temporary object in a Java program always comes off the heap. The space for the new temporary object is rarely in a cache, and so the initial use of the temporary object should run slower than the initial use of a temporary object in a C or C++ program.
With the current Java runtimes, performance is also affected because creating a new object is roughly as costly as a malloc in C or a new operation in C++. Creating a new object in any of these ways takes hundreds or even thousands of clock cycles. In C and C++, creating an object on the stack takes about 20 clock cycles. C and C++ programs create many temporary objects on the stack, bu
Read
Tutorial at: Click here to view the tutorial
Rate Tutorial: Accelerate your Java apps! - JavaWorld - September 1998
View Tutorial: Accelerate your Java apps! - JavaWorld - September 1998
Related
Tutorials:
|