next up previous contents
Next: 4.2 Case Study: Abstract Up: 4 Abstract Syntax Previous: 4 Abstract Syntax

4.1 The Hard Way of Building Abstract Syntax Trees

When building ASTs, it's a good idea to define multiple data structures to capture various families of constructs. For example, we can have an exp_AST type to represent expressions, stmt_AST to represent statements, and type_AST to represent types. These AST types can be coded using mutually recursive C structs or C++ classes. Here is an example of exp_AST in C:

typedef struct AST {
  enum { int_exp, true_exp, false_exp, variable_exp,
         binary_op, unary_op, function_call,
         record_construction, projection } tag;
  union { int                                      integer;
          string                                   variable;
          struct { string           oper;
                   struct AST*      left;
                   struct AST*      right; }       binary;
          struct { string           oper;
                   struct AST*      uexp; }        unary;
          struct { string           name;
                   struct AST_list* arguments; }   call;
          struct rec { string       attribute;
                       struct AST*  value;
                       struct rec*  next; }        record;
          struct { struct AST*  value;
                   string attribute; }             project;
      } op;
} ast;
where AST_list is a list of ASTs:
typedef struct AST_list { 
  ast*             elem;
  struct AST_list* next;
} ast_list;
It's a good idea to define a constructor for every kind of expression to simplify the task of constructing ASTs:
ast* make_binary_op ( string oper, ast* left, ast* right ) {
  ast* e = (ast*) malloc(sizeof(ast));
  e->tag = binary_op;
  e->op.binary.oper = make_string(oper);
  e->op.binary.left = left;
  e->op.binary.right = right;
  return e;
};
For example,
make_binary_op("+",make_binary_op("-",make_variable("x"),make_integer(2)),
               make_integer(3))
constructs the AST for the input (x-2)+3.

Unfortunately, when constructing a compiler, we need to define many tree-like data structures to capture ASTs for many different constructs, such as expressions, statements, declarations, programs etc, as well as type structures, intermediate representation (IR) trees, etc. This would require hundreds of recursive structs in C or classes in C++. An alternative method is to use just one generic tree structure to capture all possible tree structures. This is exactly what we did for our calculator example and is described in detail below.


next up previous contents
Next: 4.2 Case Study: Abstract Up: 4 Abstract Syntax Previous: 4 Abstract Syntax
Leonidas Fegaras
2002-08-26