Programming Tutorials Browser Tutorials Articles Struts Tutorials Hibernate Tutorials

  Tutorial: Build your own languages with JavaCC - JavaWorld December 2000

Build your own languages with JavaCC - JavaWorld December 2000

Tutorial Details:

Build your own languages with JavaCC
Build your own languages with JavaCC
By: By Oliver Enseling
JavaCC makes it a snap to write your own compiler or interpreter for languages of your own design in Java
o you ever wonder how the Java compiler works? Do you need to write parsers for markup documents that do not subscribe to standard formats such as HTML or XML? Or do you want to implement your own little programming language just for the heck of it? JavaCC allows you to do all of that in Java. So whether you're just interested in learning more about how compilers and interpreters work, or you have concrete ambitions of creating the successor to the Java programming language, please join me on this month's quest to explore JavaCC , highlighted by the construction of a handy little command-line calculator.
Compiler construction fundamentals
Programming languages are often divided, somewhat artificially, into compiled and interpreted languages, although the boundaries have become blurred. As such, don't worry about it. The concepts discussed here apply equally well to compiled as well as interpreted languages. We will use the word compiler below, but for the scope of this article, that shall include the meaning of interpreter.
Compilers have to perform three major tasks when presented with a program text (source code):
Lexical analysis
Syntactic analysis
Code generation or execution
The bulk of the compiler's work centers around steps 1 and 2, which involve understanding the program source code and ensuring its syntactical correctness. We call that process parsing , which is the parser' s responsibility.
Lexical analysis (lexing)
Lexical analysis takes a cursory look at the program source code and divides it into proper tokens. A token is a significant piece of a program's source code. Token examples include keywords, punctuation, literals such as numbers, and strings. Nontokens include white space, which is often ignored but used to separate tokens, and comments.
Syntactic analysis (parsing)
During syntactic analysis, a parser extracts meaning from the program source code by ensuring the program's syntactical correctness and by building an internal representation of the program.
Computer language theory speaks of programs, grammar, and languages. In that sense, a program is a sequence of tokens. A literal is a basic computer language element that cannot be further reduced. A grammar defines rules for building syntactically correct programs. Only programs that play by the rules defined in the grammar are correct. The language is simply the set of all programs that satisfy all your grammar rules.
During syntactic analysis, a compiler examines the program source code with respect to the rules defined in the language's grammar. If any grammar rule is violated, the compiler displays an error message. Along the way, while examining the program, the compiler creates an easily processed internal representation of the computer program.
A computer language's grammar rules can be specified unambiguously and in their entirety with the EBNF (Extended Backus-Naur-Form) notation (for more on EBNF, see Resources ). EBNF defines grammars in terms of production rules. A production rule states that a grammar element -- either literals or composed elements -- can be composed of other grammar elements. Literals, which are irreducible, are keywords or fragments of static program text, such as punctuation symbols. Composed elements are derived by applying production rules. Production rules have the following general format:
GRAMMAR_ELEMENT := list of grammar elements
| alternate list of grammar elements
As an example, let's look at the grammar rules for a small language that describes basic arithmetic expressions:
expr := number
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '(' expr ')'
| - expr
number := digit+ ('.' digit+)?
digit := '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
Three production rules define the grammar elements:
expr
number
digit
The language defined by that grammar allows us to specify arithmetic expressions. An expr is either a number or one of the four infix operators applied to two expr s, an expr in parenthesis, or a negative expr . A number is a floating-point number with optional decimal fraction. We define a digit to be one of the familiar decimal digits.
Code generation or execution
Once the parser successfully parses the program without error, it exists in an internal representation that is easy to process by the compiler. It is now relatively easy to generate machine code (or Java bytecode for that matter) from the internal representation or to execute the internal representation directly. If we do the former, we are compiling; in the latter case, we talk about interpreting.
JavaCC
JavaCC , available for free, is a parser generator. It provides a Java language extension for specifying a programming language's grammar. JavaCC was developed initially by Sun Microsystems, but it's now maintained by MetaMata. Like any decent programming tool, JavaCC was actually used to specify the grammar of the JavaCC input format.
Moreover, JavaCC allows us to define grammars in a fashion similar to EBNF, making it easy to translate EBNF grammars into the JavaCC format. Further, JavaCC is the most popular parser generator for Java, with a host of predefined JavaCC grammars available to use as a starting point.
Developing a simple calculator
We now revisit our little arithmetic language to build a simple command-line calculator in Java using JavaCC . First, we have to translate the EBNF grammar into JavaCC format and save it in the file Arithmetic.jj :
options
{
LOOKAHEAD=2;
}
PARSER_BEGIN(Arithmetic)
public class Arithmetic
{
}
PARSER_END(Arithmetic)
SKIP :
{
" "
| "\r"
| "\t"
}
TOKEN:
{
< NUMBER: ()+ ( "." ()+ )? >
| < DIGIT: ["0"-"9"] >
}
double expr():
{
}
{
term() ( "+" expr() | "-" expr() )*
}
double term():
{
}
{
unary() ( "*" term() | "/" term() )*
}
double unary():
{
}
{
"-" element() | element()
}
double element():
{
}
{
| "(" expr() ")"
}
The code above should give you an idea on how to specify a grammar for JavaCC . The options section at the top specifies a set of options for that grammar. We specify a lookahead of 2. Additional options control JavaCC 's debugging features and more. Those options can alternatively be specified on the JavaCC command line.
The PARSER_BEGIN clause specifies that the parser class definition follows. JavaCC generates a single Java class for each parser. We call the parser class Arithmetic . For now, we require only an empty class definition; JavaCC will add parsing-related declarations to it later. We end the class definition with the PARSER_END clause.
The SKIP section identifies the characters we want to skip. In our case, those are the white-space characters. Next, we define the tokens of our language in the TOKEN section. We define numbers and digits as tokens. Note that JavaCC differentiates between definitions for tokens and definitions for other production rules, which differs from EBNF. The SKIP and TOKEN sections specify this grammar's lexical analysis.
Next, we define the production rule for expr , the top-level grammar element. Notice how that definition markedly differs from the definition of expr in EBNF. What's happening? Well, it turns out that the EBNF definition above is ambiguous, as it allows multiple representations of the same program. For example, let us examine the expression 1+2*3 . We can match 1+2 into an expr yielding expr*3 , as in Figure 1.
Figure 1. EBNF parse tree of 1+2*3
Or, alternatively, we could first match 2*3 into an expr resulting in 1+expr , as shown in Figure 2.
Figure 2. Alternative EBNF parse tree of 1+2*3
With JavaCC , we have to specify the grammar rules unambiguously. As a result, we break out the definition of expr into three production rules, defining the grammar elements expr , term , unary , and element . Now, the expression 1+2*3 is parsed as shown in Figure 3.
Figure 3. Parse tree of 1+2*3
From the command line we can run JavaCC to check our grammar:
javacc Arithmetic.jj
Java Compiler Compiler Version 1.1 (Parser Generator)
Copyright (c) 1996-1999 Sun Microsystems, Inc.
Copyright (c) 1997-1999 Metamata, Inc.
(type "javacc" with no arguments for help)
Reading from file Arithmetic.jj . . .
Warning: Lookahead adequacy checking not being performed since option LOOKAHEAD
is more than 1. Set option FORCE_LA_CHECK to true to force checking.
Parser generated with 0 errors and 1 warnings.
The following checks our grammar definition for problems and generates a set of Java source files:
TokenMgrError.java
ParseException.java
Token.java
ASCII_CharStream.java
Arithmetic.java
ArithmeticConstants.java
ArithmeticTokenManager.java
Together these files implement the parser in Java. You can invoke this parser by instantiating an instance of the Arithmetic class:
public class Arithmetic implements ArithmeticConstants
{
public Arithmetic(java.io.InputStream stream) { ... }
public Arithmetic(java.io.Reader stream) { ... }
public Arithmetic(ArithmeticTokenManager tm) { ... }
static final public double expr() throws ParseException { ... }
static final public double term() throws ParseException { ... }
static final public double unary() throws ParseException { ... }
static final public double element() throws ParseException { ... }
static public void ReInit(java.io.InputStream stream) { ... }
static public void ReInit(java.io.Reader stream) { ... }
public void ReInit(ArithmeticTokenManager tm) { ... }
static final public Token getNextToken() { ... }
static final public Token getToken(int index) { ... }
static final public ParseException generateParseException() { ... }
static final public void enable_tracing() { ... }
static final public void disable_tracing() { ... }
}


 

Read Tutorial at: Click here to view the tutorial

Rate Tutorial:
Build your own languages with JavaCC - JavaWorld December 2000

View Tutorial:
Build your own languages with JavaCC - JavaWorld December 2000

Related Tutorials:

Container support for objects
Container support for objects
 
Java security evolution and concepts, Part 5
Java security evolution and concepts, Part 5
 
Will Big Blue eclipse the Java tools market?
Will Big Blue eclipse the Java tools market?
 
Navigate data with the Mapper framework
Navigate data with the Mapper framework
 
Jabber away with instant messaging
Jabber away with instant messaging
 
J2SE 1.4 breathes new life into the CORBA community, Part 1
J2SE 1.4 breathes new life into the CORBA community, Part 1
 
Maven ties together tools for better code management
Maven ties together tools for better code management
 
Sun boosts
Sun boosts enterprise Java
 
JHome
JHome Automation Light Interface Control Environment aka A.L.I.C.E. is written as a 100% Java application using both Swing and Comm API packages, all of which are extensions to the Java core libraries. Alice will allow you to control your X10 enabled li
 
a-visual-llk-parser-generator VisualLangLab
a-visual-llk-parser-generator: VisualLangLab A Visual IDE-Style LL(k) Parser Generator that uses an editable tree with icons for tokens and non-terminals to represent the grammar symbols and grammar rules.
 
Writing Ant Tasks
Writing Ant Tasks A nice feature of Ant is that it is designed to allow you to add your own tasks and use them in an build. This article shows you the basics of writing an Ant task and how to get a task to work.
 
JEP - Java Mathematical Expression Parser
JEP - Java Mathematical Expression Parser JEP is a Java API for parsing and evaluating mathematical expressions. With this library you can allow your users to enter an arbitrary formula as a string, and instantly evaluate it. JEP supports user defined
 
JLAN Server v3.3
JLAN Server v3.3 JLAN Server is a high performance JavaTM based file server supporting Windows file sharing (SMB/CIFS), NFS and FTP protocols. Write your own virtual filesystems with the core server handling all protocol exchanges with the client. Incl
 
Groovy, Java\'s New Scripting Language
Groovy, Java\'s New Scripting Language When some Java developers hear about Groovy, their first reaction often is, as mine was, "Oh, no, not another scripting language for Java." We already have, after all, JavaScript and Rhino, Jython, Jelly, BeanShell,
 
Practically Groovy: Unit test your Java code faster with Groovy
Why unit test with Groovy? What makes Groovy particularly appealing with respect to other scripting platforms is its seamless integration with the Java platform. Because it's based on the Java language (unlike other alternate languages for the JRE, which
 
Encapsulate reusable functionality in JSP tags
JavaServer Pages (JSP) are a great mechanism for delivering dynamic Web-based content. JSP provides a set of predefined tags, but you can also define your own tag extensions that encapsulate common functionality.
 
Practically Groovy: JDBC programming with Groovy
Take your practical knowledge of Groovy one step further this month, as Andrew Glover shows you how to use GroovySql to build a simple data-reporting application. GroovySql combines closures and iterators to ease Java Database Connectivity (JDBC) programm
 

Free Web Site Hosting Services Below is the listing of the hosting providers providing free web hosting services. These services helps you building your sites even if you have no experience in HTML writing. Zero
 
Web Hosting Guide. What is Web Hosting Plan?
Web Hosting Guide. What is Web Hosting Plan? What is Web Hosting Plan? Web hosting plan is the different plans provided by Hosting Companies for hosting your web site. Web hosting plans include the storage limit, bandwidth, access to server
 
New Technical Articles: 64-bit Programming on Solaris 10 OS for x86 Platforms
Four technical articles describe the new Sun Studio 10 software's 64-bit programming features on the Solaris 10 OS for x86 and AMD64 platforms. Important issues regarding the AMD64 ABI (Application Binary Interface), debugging, migration to 64-bits, and p
 
Site navigation
 

 

Send your comments, Suggestions or Queries regarding this site at roseindia_net@yahoo.com.

Copyright © 2006. All rights reserved.