Programming Tutorials Browser Tutorials Articles Struts Tutorials Hibernate Tutorials

  Tutorial: JavaWorld article about JavaCC

JavaWorld article about JavaCC

Tutorial Details:

Looking for lex and yacc for Java? You don't know Jack
Looking for lex and yacc for Java? You don't know Jack
By: By Chuck McManis
How to get and use Sun's free automatic Java parser generator -- a unique new tool that's a must for Java compiler developers
un has released Jack, a new tool written in Java that automatically generates parsers by compiling a high-level grammar specification stored in a text file. This article will serve as an introduction to this new tool. The first part of the article covers a brief introduction to automatic parser generation, and my first experiences with them. Then the article will focus on Jack and how you can use it to generate parsers and applications built with those parsers, based on your high-level grammar.
Automatic compiler parser generation
A parser is one of the most common components of a computer application. It converts text that can be read by humans into data structures known as parse trees, which are understood by the computer. I distinctly remember my introduction to automatic parser generation: In college I had completed a class on compiler construction. With the help of my wife to be, I had written a simple compiler that could turn programs written in a language made up for the class into executable programs. I remember feeling very accomplished at that point.
In my first "real" job after college, I got an assigment to create a new graphics processing language to compile into commands for a graphics coprocessor. I started with a freshly composed grammar and prepared to launch into the multiweek project of putting together a compiler. Then a friend showed me the Unix utilities lex and yacc . Lex built lexical analyzers from regular expressions, and yacc reduced a grammar specification into a table-driven compiler that could produce code when it had successfully parsed productions from that grammar. I used lex and yacc , and in less than a week my compiler was up and running! Later, the Free Software Foundation's GNU project produced "improved" versions of lex and yacc -- named flex and bison -- for use on platforms that did not run a derivative of the Unix operating system.
The world of automatic parser generation advanced again when Terrence Parr, then a student at Purdue University, created the Purdue Compiler Construction Tool Set or PCCTS. Two components of PCCTS -- DFA and ANTLR -- provide the same functions as lex and yacc ; however the grammars that ANTLR accepts are LL(k) grammars as opposed to the LALR grammars used by yacc . Furthermore, the code that PCCTS generates is much more readable than the code generated by yacc . By generating code that is easier to read, PCCTS makes it easier for a human reading the code to understand what the various pieces are doing. This understanding can be essential when trying to diagnose errors in the grammar specification. PCCTS quickly developed a following of folks who found its files easier to use than yacc.
The power of automatic parser generation is that it allows users to concentrate on the grammar and not worry about the correctness of the implementation. This can be a tremendous time-saver in both simple and complex projects.
Jack steps up to the plate
I rate tools by the generality of the problem they solve. As the requirement to parse text input comes up again and again, automatic parser generation rates pretty highly in my toolbox. Combined with the rapid development cycle of Java, automatic parser generation provides a tool for compiler design that is hard to beat.
Jack (rhymes with yacc) is a parser generator, in the spirit of PCCTS, that Sun has released for free to the Java programming community. Jack is an exceptionally easy tool to describe: Simply put, you give it a set of combined grammatical and lexing rules in the form of a .jack file and run the tool, and it gives you back a Java class that will parse that grammar. What could be easier?
Getting ahold of Jack is also quite easy. First you download a copy from the Jack home page. This comes to you in the form of a self-unpacking Java class called install . To install Jack you need to invoke this install class, which, on a Windows 95 machine is done using the command: C:>java install .
The command shown above assumes that the java command is in your command path and that the class path has been set up appropriately. If the above command did not work, or if you aren't sure whether or not you have things set up properly, open an MS-DOS window by traversing the Start->Programs->MS-DOS Prompt menu items. If you have the Sun JDK installed, you can type these commands:
C:> path C:\java\bin;%path%
C:> set CLASSPATH=.;c:\java\lib\classes.zip
If Symantec Cafe version 1.2 or later is installed, you can type these commands:
C:> path C:\cafe\java\bin;%path%
The class path should already be set up in a file called sc.ini in the bin directory of Cafe.
Next, type the java install command from above. The install program will ask you what directory you wish to install into, and the Jack subdirectory will be created below that.
Using Jack
Jack is written entirely in Java, so having the Jack classes means that this tool is instantly available on every platform that supports the Java virtual machine. However, it also means that on Windows boxes you have to run Jack from the command line. Let's say you chose the directory name JavaTools when you installed Jack on your system. To use Jack you will need to add Jack's classes to your class path. You can do this in your autoexec.bat file or in your .cshrc file if you are a Unix user. The critical command is something like the line shown below:
C:> set CLASSPATH=.;C:\JavaTools\Jack\java;C:\java\lib\classes.zip
Note that Symantec Cafe users can edit the sc.ini file and include the Jack classes there, or they can set CLASSPATH explicitly as shown above.
Setting the environment variable as shown above puts the Jack classes in the CLASSPATH between "." (the current directory) and the base system classes for Java. The main class for Jack is COM.sun.labs.jack.Main . Capitalization is important! There are exactly four capital letters in the command ('C', 'O', 'M', and another 'M'). To run Jack manually, type the command:
C:> java COM.sun.labs.jack.Main parser-input.jack
If you don't have the Jack files in your class path, you can use this command:
C:> java -classpath .;C:\JavaTools\Jack\java;c:\java\lib\classes.zip COM.sun.labs.jack.Main parser-input.jack
As you can see, this gets a bit long. To minimize typing, I put the invocation into a .bat file named Jack.bat . At some point in the future, a simple C wrapper program will become available, perhaps even as you read this. Check out the Jack home page for the availability of this and other programs.
When Jack is run, it creates several files in the current directory that you will later compile into your parser. Most are either prefixed with the name of your parser or are common to all parsers. One of these, however, ASCII_CharStream.java , may collide with other parsers, so it is probably a good idea to start off in a directory containing only the .jack file you are going to use to generate the parser.
Once you run Jack, if the generation has gone smoothly you will have a bunch of .java files in the current directory with various interesting names. These are your parsers. I encourage you to open them up with an editor and look them over. When you are ready you can compile them with the command
C:> javac -d .
ParserName.java
where ParserName is the name you gave your parser in the input file. More on that in a bit. If all of the files for your parser don't compile, you can use the brute force method of typing:
C:> javac *.java
This will compile everything in the directory. At this point your new parser is ready to use.
Jack parser descriptions
Jack parser description files have the extension .jack and are divided into three basic parts: options and base class; lexical tokens; and non-terminals. Let's look at a simple parser description (this is included in the examples directory that comes with Jack).
options {
LOOKAHEAD = 1;
}
PARSER_BEGIN( Simple1 )
public class Simple1 {
public static void main(String args[]) throws ParseError {
Simple1 parser = new Simple1 (System.in);
parser.Input();
}
}
PARSER_END( Simple1 )
The first few lines above describe options for the parser; in this case LOOKAHEAD is set to 1. There are other options, such as diagnostics, Java Unicode handling, and so on, that also can be set here. Following the options comes the base class of the parser. The two tags PARSER_BEGIN and PARSER_END bracket the class that becomes the base Java code for the resulting parser. Note that the class name used in the parser specification must be the same in the beginning, middle, and ending part of this section. In the example above, I've put the class name in bold face to make this clear. As you can see in the code above, this class defines a static main method so that the class can be invoked by the Java interpreter on the command line. The main method simply instantiates a new parser with an input stream (in this case System.in ) and then invokes the Input method. The Input method is a non-terminal in our grammar, and it is defined in the form of an EBNF element. EBNF stands for Extended Backus-Naur Form. The Backus-Naur form is a method for specifying context-free grammars. The specification consists of a terminal on the left-hand side, a production symbol, which is typically "::=", and one or more productions on the right-hand side. The notation used is typically something like this:
Keyword ::= " if " |
" then " | " else "
This would be read as, "The Keyword terminal is one of the string literals 'if', 'then', or 'else.'" In Jack, this form is extended to allow the left-hand part to be represented by a method, and the alternate expansions may be represented by regular expressions or other non-terminals. Continuing wi


 

Read Tutorial at: Click here to view the tutorial

Rate Tutorial:
JavaWorld article about JavaCC

View Tutorial:
JavaWorld article about JavaCC

Related Tutorials:

JavaWorld Developer Tools Table
JavaWorld Developer Tools Table
 
Build servlet-based enterprise Web applications - JavaWorld - December 1998
Build servlet-based enterprise Web applications - JavaWorld - December 1998
 
JavaWorld Editors' Choice Awards
JavaWorld Editors' Choice Awards
 
The Volano Report: Which Java platform is fastest, most scalable? A JavaWorld exclusive! - JavaWorld - Mar
The Volano Report: Which Java platform is fastest, most scalable? A JavaWorld exclusive! - JavaWorld - March 1999
 
The state of Java middleware, Part II: Enterprise JavaBeans - JavaWorld - April 1999
The state of Java middleware, Part II: Enterprise JavaBeans - JavaWorld - April 1999
 
XML JavaBeans, Part 1 - JavaWorld February 1999
XML JavaBeans, Part 1 - JavaWorld February 1999
 
XML JavaBeans, Part 2 - JavaWorld March 1999
XML JavaBeans, Part 2 - JavaWorld March 1999
 
Build an object database - JavaWorld January 2000
Build an object database - JavaWorld January 2000
 
JavaWorld Developer Tools Guide: IDE
JavaWorld Developer Tools Guide: IDE
 
JavaWorld Developer Tools Guide
JavaWorld Developer Tools Guide
 
JavaWorld article about JavaCC
JavaWorld article about JavaCC
 
XML for the absolute beginner - JavaWorld - April 1999
XML for the absolute beginner - JavaWorld - April 1999
 
Cleaning up after Jini services - JavaWorld March 2000
Cleaning up after Jini services - JavaWorld March 2000
 
Untangle your servlet code with reflection - JavaWorld December 2000
Untangle your servlet code with reflection - JavaWorld December 2000
 
Build your own languages with JavaCC - JavaWorld December 2000
Build your own languages with JavaCC - JavaWorld December 2000
 
Double-checked locking is still broken
Double-checked locking is still broken
 
a-visual-llk-parser-generator VisualLangLab
a-visual-llk-parser-generator: VisualLangLab A Visual IDE-Style LL(k) Parser Generator that uses an editable tree with icons for tokens and non-terminals to represent the grammar symbols and grammar rules.
 
JEP - Java Mathematical Expression Parser
JEP - Java Mathematical Expression Parser JEP is a Java API for parsing and evaluating mathematical expressions. With this library you can allow your users to enter an arbitrary formula as a string, and instantly evaluate it. JEP supports user defined
 
Enhance looping in Java 5.0 with for/in
The for/in loop -- often called either enhanced for or foreach is largely a convenience feature in Java 5.0. It doesn\\'t really offer any new functionality, but certainly makes several routine coding tasks simpler.
 
Advanced Features of JSP Custom Tag Libraries
In this article, the second in the JSP custom tag libraries series, we will cover advanced JSP features and how to use them.
 
Site navigation
 

 

Send your comments, Suggestions or Queries regarding this site at roseindia_net@yahoo.com.

Copyright © 2006. All rights reserved.