A typechecker is a function that maps an AST that represents an
expression into its type. For example, if variable x is an
integer, it will map the AST that represents the expression x+1
into the data structure that represents the type int. If there
is a type error in the expression, such as in x>"a", then it
displays an error message (a type error). So before we describe the
typechecker, we need to define the data structures for types. Suppose
that we have 5 kinds of types in the language: integers, booleans,
records, arrays, and named types (ie. type names that have been
defined earlier by some typedef). Then one possible data structure for
types is:
typedef struct type {
enum { int_type, bool_type, record_type, array_type, named_type } tag;
union { struct rectype { string attribute;
struct type* atype;
struct rectype* next; } record;
struct type* array;
string named_type;
} body;
} type;
Here the tag determines the kind of the type. Depending on the
value of tag, we have different components in the type. If the
type is integer or boolean, there are no extra components. If it is a
record, we have a list of attribute/types (the rectype) to
capture the record components. If it is an array, we have the type of
the array elements (assuming that the size of an array is unbound,
otherwise we must include the array bounds). If it is a named type, we
have the name of the type.
The symbol table must contain type declarations (ie. typedefs), variable declarations, constant declarations, and function signatures. That is, it should map strings (names) into symbols, where a symbol is:
typedef struct symbol {
enum { type_decl, var_decl, const_decl, func_decl } tag;
union { type* type_declaration;
type* variable_declaration;
struct { type* const_type;
ast* const_value; } constant_declaration;
struct { type* result_type;
type_list* parameters; } function_signature;
} body;
} symbol;
typedef struct type_list { type* elem; struct type_list* next; } type_list;
If we use the hash table with chaining implementation, the symbol
table symbol_table would look like this:
typedef struct element { string key;
symbol* binding;
struct element* next; } element;
element* symbol_table [SIZE];
Recall that the symbol table should support the following operations:
insert ( string key, symbol* binding ) symbol* lookup ( string key ) begin_scope () end_scope ()
The typechecking function may have the following signature:
type* typecheck ( ast* exp );
That is, it returns the type of the expression exp. The
function typecheck must be recursive since the AST structure is
recursive. In fact, this function is a tree traversals that checks
each node of the AST tree recursively. The body of the typechecker
may look like this:
type* typecheck ( ast* exp ) {
switch ( exp->tag ) {
case int_exp: return make_int_type();
case true_exp: return make_bool_type();
case false_exp: return make_bool_type();
case variable_exp: {
symbol* vdecl = lookup(exp->variable);
if (vdecl == NULL)
error("undefined variable");
else if (vdecl->tag != var_decl)
error("this name is not a variable name");
else return vdecl->body.variable_declaration;
};
case binary_op: {
type* left = typecheck(exp->op.binary.left);
type* right = typecheck(exp->op.binary.right);
switch ( exp->op.binary.oper ) {
case "+": if (left->tag == int_type && right->tag == int_type)
return make_int_type();
else error("expected integers in addition");
...
}
};
case function_call: {
symbol* fdecl = lookup(exp->op.call.name,venv);
if (fdecl == NULL)
error("undefined function");
else if (fdecl->tag != func_decl)
error("this name is not a function name");
type_list* s = fdecl->body.function_signature.parameters;
for(ast_list* r = exp->op.call.arguments; r != NULL && s != NULL;
r=r->next, s=s->next)
if (!equal_types(s->elem,typecheck(r->elem)))
error("wrong type of the argument in function call")
if (r != NULL || s != NULL)
error("wrong number of parameters");
return fdecl->body.function_signature.result_type;
};
...
}
}
where equal_types(x,y) checks the types x and y
for equality. We have two types of type equality: type equality based
on type name equivalence, or based on structural equivalence. For
example, if we have defined T to be a synonym for the type
int and have declared the variable x to be of type
T, then using the first type of equality, x+1 will cause a
type error (since T and int are different names), while
using the second equality, it will be correct.
Note also that since most realistic languages support many binary and unary operators, it will be very tedious to hardwire their typechecking into the typechecker using code. Instead, we can use another symbol table to hold all operators (as well as all the system functions) along with their signatures. This also makes the typechecker easy to change.