Design for
performance, Part 1: Interfaces
matter - JavaWorld January
2001
Tutorial Details:
Design for performance, Part 1: Interfaces matter
Design for performance, Part 1: Interfaces matter
By: By Brian Goetz
Avoid performance hazards when designing Java classes
any programmers don't start thinking about performance management until late in the development cycle. Often, they hold off on performance tuning until the end, hoping perhaps to avoid it entirely -- and sometimes this strategy is successful. However, early design decisions can affect the need for and success of performance tuning. If performance is likely to become an issue in your program, performance management should be integrated into the design and development cycle from day one.
This series explores some of the ways in which early design decisions can significantly affect application performance. In this article, I look at one of the most common performance problems: temporary object creation. A class's object-creational behavior is often determined -- often not deliberately -- at design time, sowing the seeds for performance problems later.
Read the whole "Design for Performance" series:
Part 1: Interfaces matter
Part 2: Reduce object creation
Part 3: Remote interfaces
Performance problems come in many varieties. The easiest to fix are those where you simply have chosen a poor algorithm for performing a computation -- such as using a bubble sort to sort a large data set, or where you are recomputing a frequently used data item every time it is used instead of caching it. You can easily spot these types of bottlenecks using profiling and, once found, they usually can be corrected quickly. However, many Java performance problems stem from a deeper and harder-to-fix source -- the interface design of a program's components.
Most programs today are constructed from components that have been either developed internally or acquired from an outside vendor. Even when programs don't rely heavily on pre-existing components, the object-oriented design process encourages applications to be factored into components, as this simplifies the design, development, and testing process. While these advantages are undeniable, you should recognize that the interfaces implemented by components might have a significant effect on the behavior and performance of the programs that use them.
At this point, you may be asking what interfaces have to do with performance. Not only does a class's interface define what functions the class can perform, but it also can define its object-creational behavior and the sequence of method calls required to use it. How a class defines its constructors and methods will dictate whether an object can be reused, whether its methods will create -- or require its client to create -- intermediate objects, and how many method calls a client needs to make in order to use that class. All of these factors affect program performance.
Watch out for object creations
One of the fundamental Java performance management principles is this: Avoid excessive object creation . This doesn't mean that you should give up the benefits of object-oriented programming by not creating any objects, but you should be wary of object creation inside of tight loops when executing performance-critical code. Object creation is expensive enough that you should avoid unnecessarily creating temporary or intermediate objects in situations where performance is an issue.
The String class is a major source of object creation in programs that manipulate text. Because String s are immutable, a new object must be created each time a String is modified or constructed. As a result, performance-conscious programmers avoid excessive use of String . However, this is often impossible. Even when you eliminate reliance on String from your code, you frequently find yourself using components whose interfaces are defined only in terms of String . Thus, you end up being forced to use String anyway.
Example: Regular expression matching
As an example, suppose you write a mail server called MailBot. MailBot needs to process the MIME header lines -- such as the send date or the sender's email address -- located at the top of each message. It will process the MIME header lines using a component for matching regular expressions to make the procedure easier. MailBot is smart enough not to create a String object for each header line or header element. Instead, it fills up a character buffer with the input text and identifies the headers to be processed by indexing into this buffer. MailBot will call the regular expression matcher to process each header line, so the matcher's performance will be significant.
Lets start with an example of a very poor interface for your regular expression matcher class:
public class AwfulRegExpMatcher {
/** Create a matcher with the given regular expression and which will
* operate on the given input string */
public AwfulRegExpMatcher(String regExp, String inputText);
/** Retrieve the next match of the pattern against the input text,
returning the matched text if possible or null if not */
public String getNextMatch();
}
Even if this class implements an efficient regular expression-matching algorithm, any program that uses it heavily will suffer. Since the matcher object is tied to the input text, every time you want to invoke it, you will have to first construct a new matcher object. Since you aim to reduce unnecessary object creations, having the ability to reuse the matcher would seem an obvious place to start.
The class definition below illustrates another possible interface for your matcher, which allows for reuse of the matcher, but is still pretty bad:
public class BadRegExpMatcher {
public BadRegExpMatcher(String regExp);
/** Attempts to match the specified regular expression against the input
text, returning the matched text if possible or null if not */
public String match(String inputText);
/** Get the next match against the input text, or return null if no match */
public String getNextMatch();
}
Ignoring the more subtle points of regular expression-matching -- such as returning matched subexpressions, what's wrong with this seemingly harmless class definition? From a functionality point of view, nothing. But from a performance point of view, a lot. First, the matcher requires its caller to create a String to represent the text to be matched. MailBot tries to avoid generating String objects, but when it finds a header it wants to parse as a regular expression, it has to create a String to satisfy BadRegExpMatcher :
BadRegExpMatcher dateMatcher = new BadRegExpMatcher(...);
while (...) {
...
String headerLine = new String(myBuffer, thisHeaderStart,
thisHeaderEnd-thisHeaderStart);
String result = dateMatcher.match(headerLine);
if (result == null) { ... }
}
Second, the matcher creates the result string even if MailBot is interested only in whether the string matched or not, and doesn't require the matched text. This means that in order to simply use BadRegExpMatcher to validate that a date header conforms to a specific format, you must create two String objects -- the input to the matcher, and the resulting matched text. Two objects may not seem like very many, but if you have to create two objects for each header line of each mail message that MailBot processes, this could significantly influence performance. The fault doesn't lie in the design of MailBot but in the design of -- or the choice to use -- the BadRegExpMatcher class.
Note that returning a lighter-weight Match object -- which could expose the getOffset() , getLength() , and getMatchString() methods -- instead of returning a String would not improve performance by much. While creating a Match object is probably cheaper than creating a String -- as that involves generating a char[] array and copying the data, you still create an intermediate object that is of little value to your caller.
It's bad enough that BadRegExpMatcher forces you to provide it with input in the form that it wants to see, rather than in the form that you can more efficiently provide. But using BadRegExpMatcher comes with another risk, one that is potentially even more hazardous to MailBot's performance: You began with the noble intention of avoiding the use of String s when processing the mail headers. But since you are forced to create many String objects anyway to satisfy BadRegExpMatcher , you might be tempted to abandon that goal and use String even more liberally. Now, one component's bad design has infected the program that uses it. Even if you later find a better regular expression component that doesn't require you to provide it with a String , your whole program might be infected by then.
A better interface
How can you define BadRegExpMatcher so as not to cause such a problem? First, BadRegExpMatcher should try not to dictate the format of its input. It should be willing to accept the input in whatever formats its caller can efficiently provide. Second, it should not automatically generate a String for the resulting match; it should return enough information so that the caller can create it if desired. (It can also provide a method to do this, as a convenience, but its use should not be required.) Here is a better interface:
class BetterRegExpMatcher {
public BetterRegExpMatcher(...);
/** Provide matchers for multiple formats of input -- String,
character array, and subset of character array. Return -1 if no
match was made; return offset of match start if a match was
made. */
public int match(String inputText);
public int match(char[] inputText);
public int match(char[] inputText, int offset, int length);
/** Get the next match against the input text, if any */
public int getNextMatch();
/** If a match was made, returns the length of the match; between
the offset and the length, the caller should be able to
reconstruct the match text from the offset and length */
public int getMatchLength();
/** Convenience routine to get the match string, in th
Read
Tutorial at: Click here to view the tutorial
Rate Tutorial: Design for
performance, Part 1: Interfaces
matter - JavaWorld January
2001
View Tutorial: Design for
performance, Part 1: Interfaces
matter - JavaWorld January
2001
Related
Tutorials:
Programming Java threads in the
real world, Part
8
Programming Java threads in the
real world, Part
8 |
I want my AOP!, Part 1
I want my AOP!, Part 1 |
A birds-eye view of Web services
A birds-eye view of Web services |
Cache SOAP services on
the client side
Cache SOAP services on
the client side |
I want my AOP!, Part 3
I want my AOP!, Part 3 |
Business process
automation
made easy with
Java, Part 2
Business process
automation
made easy with
Java, Part 2 |
Should you go
with JMS?
Should you go
with JMS? |
Check out three
collections libraries
Check out three
collections libraries |
Get the inside
track on J2EE architect certification
Get the inside
track on J2EE architect certification |
Attack of the
clones
Attack of the
clones |
Worth
reading
Worth
reading |
Impressive
!
Impressive
! |
Fixing the Java Memory Model, Part 1
JSR 133, which has been active for nearly three years, has recently issued its public recommendation on what to do about the Java Memory Model (JMM). |
XStream
XStream is a simple library to serialize objects to XML and back again. |
JLAN Server v3.3
JLAN Server v3.3
JLAN Server is a high performance JavaTM based file server supporting Windows file sharing (SMB/CIFS), NFS and FTP protocols.
Write your own virtual filesystems with the core server handling all protocol exchanges with the client.
Incl |
Developing Your First EJBs, Part 2
the authors walked through what you need to do to develop your first entity bean. This week concludes this series with a look at how to develop a session bean, building on the examples presented in part one.
|
Access Windows Performance Monitor counters from Java, Part 1
Access Windows Performance Monitor counters from Java, Part 1
Use a simple Java API to gather valuable performance statistics
Summary
Windows NT, 2000, 2003, and XP contain a utility called the Performance Monitor that provides a rich array of perform |
A well-behaved Jetspeed portlet
This article presents a working example of how to construct a Jetspeed portlet that runs efficiently, adheres to the Model 2 architecture, and, by not interfering with additional portlets, plays well with others. In addition, I demonstrate some simple way |
What is Persistence Framework?
What is Persistence Framework?
What is Persistence Framework?
A persistence framework moves the program data in its most natural form (in memory objects) to and from a permanent data store the database. The persistence framework manages the |
New Technical Articles: 64-bit Programming on Solaris 10 OS for x86 Platforms
Four technical articles describe the new Sun Studio 10 software's 64-bit programming features on the Solaris 10 OS for x86 and AMD64 platforms. Important issues regarding the AMD64 ABI (Application Binary Interface), debugging, migration to 64-bits, and p |
|
|
|