Datastructures and algorithms, Part 1
Tutorial Details:
Datastructures and algorithms, Part 1
Datastructures and algorithms, Part 1
By: By Jeff Friesen
Explore datastructures, algorithms, flowcharts, pseudocode, and arrays
omputer science emphasizes two important topics: datastructures and algorithms. Those topics are important because the choices you make for a program's datastructures and algorithms affect that program's memory usage (for datastructures) and CPU time (for algorithms that interact with those datastructures). When choosing a datastructure or algorithm, you sometimes discover an inverse relationship between memory usage and CPU time: the less memory a datastructure uses, the more CPU time associated algorithms need to process the datastructure's data items, which are primitive type values or objects, via references. Also, the more memory a datastructure uses, the less CPU time associated algorithms need to process the data items?and faster algorithms result. This inverse relationship appears in Figure 1.
Figure 1. The inverse relationship between memory usage and CPU time
An example of the inverse relationship between memory usage and CPU time involves the one-dimensional array and doubly-linked list datastructures, and their insertion/deletion algorithms. For any given list of data items, a one-dimensional array occupies less memory than a doubly linked list: a doubly linked list needs to associate links with data items to find each data item's predecessor and successor, which requires extra memory. In contrast, a one-dimensional array's insertion/deletion algorithms are slower than a doubly linked list's equivalent algorithms: inserting a data item into or deleting a data item from a one-dimensional array requires data item movement to expose an empty element for insertion or close an element made empty by deletion. (I explore one-dimensional arrays later in this article, and doubly linked lists in next month's article.)
This article initiates a two-part series that explores datastructures and algorithms. The article begins with a presentation of basic concepts and continues with a tour of the array datastructure. You learn about one-dimensional, two-dimensional, and ragged arrays, plus linear-search, bubble-sort, binary-search, and matrix-multiplication array-oriented algorithms. The article ends by asserting that Java's arrays are objects.
Note: You can download the source code that accompanies this article from Resources .
Datastructure and algorithm basics
Before we explore specific datastructures and algorithms, we need to examine three basic questions: What is a datastructure? What is an algorithm? How do you represent an algorithm? Knowledge of those concepts helps you understand this series.
What is a datastructure?
Datastructures have been around since the structured programming era. A definition from that era: a datastructure is a set of types, a designated type from that type set, a set of functions, and a set of axioms. That definition implies that a datastructure is a type with implementation. In our object-oriented programming era, type with implementation means class.
The datastructure is a class definition is too broad because it embraces Employee , Vehicle , Account , and many other real-world entity-specific classes as datastructures. Although those classes structure various data items, they do so to describe real-world entities (in the form of objects) instead of describing container objects for other entity (and possibly container) objects. This containment idea leads to a more appropriate datastructure definition: a datastructure is a container class that provides storage for data items, and capabilities for storing and retrieving data items. Examples of container datastructures: arrays, linked lists, stacks, and queues. (I will explore the linked-list, stack, and queue datastructures next month.)
What is an algorithm?
Algorithms often associate with datastructures. An algorithm is a sequence of instructions that accomplishes a task in a finite time period. The algorithm receives zero or more inputs, produces at least one output, consists of clear and unambiguous instructions, terminates after a finite number of steps, and is basic enough that a person can carry out the algorithm using a pencil and paper. In contrast, a program is not necessarily finite: the program, such as a Web server, may never terminate without external intervention. Examples of algorithms associated with datastructures: linear-search, bubble-sort, binary-search, matrix-multiplication, linked-list-concatenation, stack-push-and-pop, and queue-insert-and-delete. (I will explore the linked-list-concatenation, stack-push-and-pop, and queue-insert-and-delete algorithms next month.)
Algorithm representation
How do you represent an algorithm? The most obvious representation: Java source code. However, writing source code before you fully understand an algorithm often leads to difficult-to-find bugs. One technique for overcoming those bugs involves flowcharts.
A flowchart is a visual representation of an algorithm's control flow. That representation illustrates statements that need to execute, decisions that need to be made, logic flow (for iteration and other purposes), and terminals that indicate start and end points. To present that visual representation, a flowchart uses various symbols, which Figure 2 shows.
Figure 2. Flowchart symbols for statements, decisions, logic flow, and terminals
What does a flowchart look like? Suppose you have a simple algorithm that initializes a counter to 0, reads characters until a new-line (\n) character is seen, increments the counter for each read digit character, and prints the counter's value after the new-line character has been read. Figure 3's flowchart illustrates this algorithm's control flow.
Figure 3. A flowchart for counting digit characters. Click on thumbnail to view full-size image.
A flowchart's advantages include its simplicity and its ability to present an algorithm's control flow visually. Flowcharts also have disadvantages:
Highly-detailed flowcharts can introduce errors or inaccuracies.
Extra time is required to position, label, and connect a flowchart's symbols, even though tools speed up this process. This delay might slow your understanding of an algorithm.
Because flowcharts are tools of the structured programming era, they aren't as useful in an object-oriented context. The Unified Modeling Language (UML) is more appropriate for providing object-oriented visual representations.
An alternative to flowcharts is pseudocode: a textual representation of an algorithm that approximates the final source code. Pseudocode is useful for quickly writing down an algorithm's representation. Because syntax is not a concern, no rules define how to write pseudocode. Consider the following example:
DECLARE CHARACTER ch
DECLARE INTEGER count = 0
DO
READ ch
IF ch IS '0' THROUGH '9' THEN
count++
END IF
UNTIL ch IS '\n'
PRINT count
END
The example above presents the pseudocode equivalent of Figure 3's flowchart. Although locating control flow in pseudocode can prove harder than finding it in a flowchart, typically, writing pseudocode takes less time than drawing a flowchart.
Note
This series uses pseudocode to represent algorithms.
The array datastructure
The array is one of the most widely used datastructures because of its flexibility in deriving complex datastructures and its simplicity. Because arrays are so prevalent and because I needed to use the one-dimensional array variant in several of this column's earlier programs, I briefly introduced the array datastructure in " Non-Object-Oriented Language Basics, Part 1 ." This section builds on that earlier article's introduction and begins with a definition: an array is a sequence of elements, where each element (a group of memory bytes that stores a single data item) associates with at least one index (nonnegative integer). That definition raises four interesting points:
Each element occupies the same number of bytes; the exact number depends on the type of the element's data item.
All elements share the same type.
We tend to think of an array's elements as occupying consecutive memory locations. When I discuss two-dimensional arrays, you'll discover that isn't always the case.
The number of indexes associated with any element is the array's dimension.
Note
This section focuses exclusively on arrays of one and two dimensions because higher-dimensional arrays are not used as frequently.
One-dimensional arrays
The simplest kind of array has one dimension: each element associates with a single index. Java provides three techniques for creating a one-dimensional array: use only an initializer, use only keyword new , and use keyword new with an initializer. Because I showed you how to create a one-dimensional array with only an initializer in "Non-Object-Oriented Language Basics, Part 1," I focus on the latter two techniques in this section.
Use only keyword new to create a one-dimensional array. That technique requires either of the following syntaxes:
type variable_name '[' ']' '='
'new' type '[' integer_expression ']' ';'
type '[' ']' variable_name '='
'new' type '[' integer_expression ']' ';'
Either syntax:
Specifies variable_name as the name of the one-dimensional array variable.
Specifies type as each element's type. Because the one-dimensional array variable holds a reference to the one-dimensional array, the variable's type is type [ ] .
Specifies keyword new , followed by type , followed by integer_expression between square brackets ( [ ] ), which identifies the number of elements. new allocates memory for a one-dimensional array's elements and zeroes all bits in each element's bytes, which means that each element contains a default value that you interpret based on type .
Specifies = to assign the one-dimensional array's reference to variable_name .
Tip
Java developers often place square bracket characte
Read
Tutorial at: Click here to view the tutorial
Rate Tutorial: Datastructures and algorithms, Part 1
View Tutorial: Datastructures and algorithms, Part 1
Related
Tutorials:
The battle of the container
frameworks: which should you use? - JavaWorld - January
1999
The battle of the container
frameworks: which should you use? - JavaWorld - January
1999 |
Programming Java threads in the
real world, Part
1 - JavaWorld -
September 1998
Programming Java threads in the
real world, Part
1 - JavaWorld -
September 1998 |
Build an object database - JavaWorld January
2000
Build an object database - JavaWorld January
2000 |
Java security evolution
and concepts, Part 1: Security nuts and bolts - JavaWorld April
2000
Java security evolution
and concepts, Part 1: Security nuts and bolts - JavaWorld April
2000 |
Construct secure
networked applications with certificates, Part 1 - JavaWorld January
2001
Construct secure
networked applications with certificates, Part 1 - JavaWorld January
2001 |
Jato: The new kid on the open source block - JavaWorld March 2001
Jato: The new kid on the open source block - JavaWorld March 2001 |
Performance books put to the test - JavaWorld March 2001
Performance books put to the test - JavaWorld March 2001 |
Jato: The new kid on the open source block, Part 2 - JavaWorld April
2001
Jato: The new kid on the open source block, Part 2 - JavaWorld April
2001 |
Web services hits
the Java scene,
Part 1
Web services hits
the Java scene,
Part 1 |
Good
introduction to JDO
Good
introduction to JDO |
J2SE 1.4
breathes new life into the CORBA community, Part
2
J2SE 1.4
breathes new life into the CORBA community, Part
2 |
Sort it
out
Sort it
out |
J2SE 1.4.1
boosts garbage
collection
J2SE 1.4.1
boosts garbage
collection |
Datastructures and algorithms, Part 1
Datastructures and algorithms, Part 1 |
Let the mobile
games begin, Part 2
Let the mobile
games begin, Part 2 |
Sizeof for
Java
Sizeof for
Java |
Java for Symmetric Cryptography
Java for Symmetric Cryptography
Cryptography—literally, secret writing—is the practice of encrypting and decrypting data. To encrypt or decrypt data, you apply an algorithm, which will be a series of transformations to the input data (the plaintext) to |
Clustering and Load Balancing in Tomcat 5, Part 1
The latest version of the Tomcat servlet container provides clustering and load balancing capabilities that are essential for deploying scalable and robust web applications. |
JDBC scripting, Part 2
JDBC scripting, Part 2
Programming and Java scripting in JudoScript
Summary
JudoScript is a rich functional scripting language, and an easy and powerful general programming and Java scripting language.
JudoScript's power comes from its synergy of |
Definition of Bioinformatics
Definition of Bioinformatics
Definition of Bioinformatics
About Bioinformatics
In February 2001, the human genome was finally deciphered! In other words, scientists have succeeded in reading the chain of more than 3 billion base pairs that |
|
|
|