next up previous contents
Next: 5 Semantic Actions Up: 4 Abstract Syntax Previous: 4.2 Building Abstract Syntax   Contents

4.3 Gen: A Java Package for Constructing and Manipulating Abstract Syntax Trees

The code for the calculator example is written in Gen. Gen is a Java preprocessor that adds syntactic constructs to the Java language to make the task of handling Abstract Syntax Trees (ASTs) easier. The class project will be developed using Gen.

Two classes are used by Gen: the class Tree that captures an AST (with subclasses VariableLeaf, LongLeaf, DoubleLeaf, StringLeaf, and Node), and the class Trees that captures a list of ASTs. An Tree is a tree-like data structure, which is used for representing various tree-like data structures used in compiler construction, including ASTs and Intermediate Representation (IR) code.

abstract class Tree {
    public boolean is_node () { return (this instanceof Node); }
    public boolean is_variable () { return (this instanceof VariableLeaf); }
    public boolean is_long () { return (this instanceof LongLeaf); }
    public boolean is_string () { return (this instanceof StringLeaf); }
    public boolean is_double () { return (this instanceof DoubleLeaf); }
    public String variableValue () { return ((VariableLeaf)this).value; }
    public long longValue () { return ((LongLeaf)this).value; }
    public String stringValue () { return ((StringLeaf)this).value; }
    public double doubleValue () { return ((DoubleLeaf)this).value; }
class VariableLeaf extends Tree {
    public String value;
    public VariableLeaf ( String s ) { value = s; }
class LongLeaf extends Tree {
    public long value;
    public LongLeaf ( long n ) { value = n; }
class DoubleLeaf extends Tree {
    public double value;
    public DoubleLeaf ( double n ) { value = n; }
class StringLeaf extends Tree {
    public String value;
    public StringLeaf ( String s ) { value = s; }
class Node extends Tree {
    public String name;
    public Trees children;
    public Node ( String n, Trees a ) { name = n; children = a; }
    public String name () { return name; }
    public Trees children () { return children; }
where the Trees class represents a list of ASTs:
class Trees {
    public Tree  head;
    public Trees tail;
    public Trees ( Tree e, Trees t );
    public final static Trees nil;     // an empty list
    public Trees cons ( Tree e );      // put the element e before the list
    public Trees append ( Tree e );    // append the element e at the end of list
    public Trees append ( Trees s );   // append the list s at the end of list
    public boolean is_empty ();        // is this an empty list?
    public int length ();
    public Trees reverse ();           // return the reverse of the list
    public boolean member ( Tree e );  // is e a member of the list?
    public Tree nth ( int n );         // return the nth list element
To iterate over Trees, one can use Java iterators:
Trees ts = ...;
for( Tree e: ts )
For example, the Java expression
new Node("Binop",
         Trees.nil.append(new VariableLeaf("Plus"))
                  .append(new VariableLeaf("x"))
                  .append(new Node("Binop",
                                   Trees.nil.append(new VariableLeaf("Minus"))
                                            .append(new VariableLeaf("y"))
                                            .append(new VariableLeaf("z")))))
constructs the AST Binop(Plus,x,Binop(Minus,y,z)).

The nice thing about this approach is that we do not need to add a new class to define another tree-like data structure. The disadvantage is that it's now easier to make mistakes when writing programs to manipulate these tree structures. For example, we may pass a statement tree in a function that handles expression trees and this will not be detected by the Java compiler.

To make the task of writing these tree constructions less tedious, Gen extends Java with the syntactic form #< >. For example, #<Binop(Plus,x,Binop(Minus,y,z))> is equivalent to the above Java expression. That is, the text within the brackets #< > is used by Gen to generate Java code, which creates the tree-like form (an instance of the class Tree) that represents this text. Values of the class Tree can be included into the form generated by the #< > brackets by ``escaping" them with a backquote character (`). The operand of the escape operator (the backquote operator) is expected to be an expression of type Tree that provides the value to ``fill in" the hole in the bracketed text at that point (actually, an escaped string/int/double is also lifted to a Tree). For example, in

  Tree x = #<join(a,b,p)>;
  Tree y = #<select(`x,q)>;
  Tree z = #<project(`y,A)>;
y is set to #<select(join(a,b,p),q)> and z to #<project(select(join(a,b,p),q),A)>. There is also bracketed syntax, #[ ], for constructing instances of Trees.

The bracketed syntax has the following BNF:

brack ::=  "#<" expr ">"                   constructs a Tree
         | "#[" arg "," ... "," arg "]"    constructs a Trees list
expr ::=   name                            the representation of a variable name
         | integer                         the repr. of an integer
         | real                            the repr. of a real number
         | string                          the repr. of a string
         | "`" name                        escaping to the value of name
         | "`(" code ")"                   escaping to the value of code
         | name "(" el "," ... "," el ")"  the repr. of a Node with zero or more children
         | "`" name "(" el "," ... "," el ")" the repr. of a Node with escaped name
         | expr opr expr                   a Node that represents an infix operation
el  ::=   expr                             the repr. of an expression
         | "..." name                      escaping to a list bound to name
         | "...(" code ")"                 escaping to a list returned by code

where code is any Java code. The #`(code) embeds the value returned by the Java code code of type Tree to the term representation inside the brackets. For example, #<`f(6,...r,g("ab",`(k(x))),`y)> is equivalent to the following Java code:

new Node(f,
    Trees.nil.append(new LongLeaf(6))
             .append(new Node("g",Trees.nil.append(new StringLeaf("ab"))
If f="h", y=#<m(1,"a")>, and k(x) returns the value #<8>, then the above term is equivalent to #<h(6,g("ab",8),m(1,"a"))>. The three dots (...) is used to indicate a list of children in an AST node. Since this list is an instance of the class Trees, the type of name in is also Trees. For example, in
  Trees r = #[join(a,b,p),select(c,q)];
  Tree z = #<project(...r)>;
z will be bound to #<project(join(a,b,p),select(c,q))>.

Gen provides a case statement syntax with patterns. Patterns match the Tree representations with similar shape. Escape operators applied to variables inside these patterns represent variable patterns, which ``bind" to corresponding subterms upon a successful match. This capability makes it particularly easy to write functions that perform source-to-source transformations. A function that simplifies arithmetic expressions can be expressed easily as:

  Tree simplify ( Tree e ) {
    match e {
    case plus(`x,0): return x;
    case times(`x,1): return x;
    case times(`x,0): return #<0>;
    case _: return e;
where the _ pattern matches any value. For example, simplify(#<times(z,1)>) returns #<z>, since times(z,1) matches the second case pattern. The BNF of the case statement is:

case_stmt ::= "match" code "{" case ... case "}"
case    ::= "case" pat ":" code
pat ::=   name                               exact match with a variable name
        | integer                            exact match with an integer
        | real                               exact match with a real number
        | string                             exact match with a string
        | "`" name                           match with the value of name
        | name "(" pl "," ... "," pl ")"     match with a Node with zero or more children
        | "`" name "(" pl "," ... "," pl ")" match with a Node with escaped name
        | pat opr pat                        a Node that represents an infix operation
        | "_"                                match any Tree
pl  ::=   pat                                match with a Tree
        | "..." name                         match with a list bound to name
        | "...(" code ")"                    match with a list returned by code
        | "..."                              ignore the rest of the arguments

For example, the pattern `f(...r) matches any Tree Node: when it is matched with #<join(a,b,c)>, it binds f to the string "join" and r to the Trees #[a,b,c]. Another example is the following function that adds the terms #<8> and #<9> as children to any Node e:

  Tree add_arg ( Tree e ) {
    match e {
    case `f(...r): return #<`f(8,9,...r)>;
    case `x: return x;
The special keyword fail is used in Gen to abort the current matching and skip to the next case. For example
    match e {
    case `f(...r,join(`x,`y),...s):
         if (x.equals(y))
             return #<`f(...r,`x,...s)>;
         else fail;
    case `x: return x;
will transform join(`x,`y) to x if x is equal to y (it is not permitted to use a variable twice in a pattern, ie., join(`x,`x) is an illegal pattern).

next up previous contents
Next: 5 Semantic Actions Up: 4 Abstract Syntax Previous: 4.2 Building Abstract Syntax   Contents
fegaras 2012-01-10