When building ASTs, it's a good idea to define multiple data structures to capture various families of constructs. For example, we can have an exp_AST type to represent expressions, stmt_AST to represent statements, and type_AST to represent types. These AST types can be coded using mutually recursive C structs or C++ classes. Here is an example of exp_AST in C:
typedef struct AST {
enum { int_exp, true_exp, false_exp, variable_exp,
binary_op, unary_op, function_call,
record_construction, projection } tag;
union { int integer;
string variable;
struct { string oper;
struct AST* left;
struct AST* right; } binary;
struct { string oper;
struct AST* uexp; } unary;
struct { string name;
struct AST_list* arguments; } call;
struct rec { string attribute;
struct AST* value;
struct rec* next; } record;
struct { struct AST* value;
string attribute; } project;
} op;
} ast;
where AST_list is a list of ASTs:
typedef struct AST_list {
ast* elem;
struct AST_list* next;
} ast_list;
It's a good idea to define a constructor for every kind of expression
to simplify the task of constructing ASTs:
ast* make_binary_op ( string oper, ast* left, ast* right ) {
ast* e = (ast*) malloc(sizeof(ast));
e->tag = binary_op;
e->op.binary.oper = make_string(oper);
e->op.binary.left = left;
e->op.binary.right = right;
return e;
};
For example,
make_binary_op("+",make_binary_op("-",make_variable("x"),make_integer(2)),
make_integer(3))
constructs the AST for the input (x-2)+3.
Unfortunately, when constructing a compiler, we need to define many tree-like data structures to capture ASTs for many different constructs, such as expressions, statements, declarations, programs etc, as well as type structures, intermediate representation (IR) trees, etc. This would require hundreds of recursive structs in C or classes in C++. An alternative method is to use just one generic tree structure to capture all possible tree structures. This is exactly what we did for our calculator example and is described in detail below.