next up previous contents
Next: 6.3 Case Study: The Up: 6 Semantic Analysis Previous: 6.1 Symbol Tables

6.2 Types and Type Checking

A typechecker is a function that maps an AST that represents an expression into its type. For example, if variable x is an integer, it will map the AST that represents the expression x+1 into the data structure that represents the type int. If there is a type error in the expression, such as in x>"a", then it displays an error message (a type error). So before we describe the typechecker, we need to define the data structures for types. Suppose that we have 5 kinds of types in the language: integers, booleans, records, arrays, and named types (ie. type names that have been defined earlier by some typedef). Then one possible data structure for types is:

typedef struct type {
  enum { int_type, bool_type, record_type, array_type, named_type } tag;
  union { struct rectype { string          attribute;
                           struct type*    atype;
                           struct rectype* next; }      record;
          struct type*                                  array;
          string                                        named_type;
        } body;
} type;

Here the tag determines the kind of the type. Depending on the value of tag, we have different components in the type. If the type is integer or boolean, there are no extra components. If it is a record, we have a list of attribute/types (the rectype) to capture the record components. If it is an array, we have the type of the array elements (assuming that the size of an array is unbound, otherwise we must include the array bounds). If it is a named type, we have the name of the type.

The symbol table must contain type declarations (ie. typedefs), variable declarations, constant declarations, and function signatures. That is, it should map strings (names) into symbols, where a symbol is:

typedef struct symbol {
  enum { type_decl, var_decl, const_decl, func_decl } tag;
  union { type*                              type_declaration;
          type*                              variable_declaration;
          struct { type* const_type;
                   ast*  const_value; }      constant_declaration;
          struct { type*      result_type;
                   type_list* parameters; }  function_signature;
        } body;
} symbol;
typedef struct type_list { type* elem; struct type_list* next; } type_list;

If we use the hash table with chaining implementation, the symbol table symbol_table would look like this:

typedef struct element { string          key;
                         symbol*         binding;
                         struct element* next; } element;
element* symbol_table [SIZE];

Recall that the symbol table should support the following operations:

insert ( string key, symbol* binding )
symbol* lookup ( string key )
begin_scope ()
end_scope ()

The typechecking function may have the following signature:

type* typecheck ( ast* exp );

That is, it returns the type of the expression exp. The function typecheck must be recursive since the AST structure is recursive. In fact, this function is a tree traversals that checks each node of the AST tree recursively. The body of the typechecker may look like this:

type* typecheck ( ast* exp ) {
   switch ( exp->tag ) {
   case int_exp: return make_int_type();
   case true_exp: return make_bool_type();
   case false_exp: return make_bool_type();
   case variable_exp: {
        symbol* vdecl = lookup(exp->variable);
        if (vdecl == NULL)
           error("undefined variable");
        else if (vdecl->tag != var_decl)
           error("this name is not a variable name");
        else return vdecl->body.variable_declaration;
   case binary_op: {
        type* left = typecheck(exp->op.binary.left);
        type* right = typecheck(exp->op.binary.right);
        switch ( exp->op.binary.oper ) {
        case "+": if (left->tag == int_type && right->tag == int_type)
                     return make_int_type();
                  else error("expected integers in addition");
   case function_call: {
        symbol* fdecl = lookup(exp->,venv);
        if (fdecl == NULL)
           error("undefined function");
        else if (fdecl->tag != func_decl)
           error("this name is not a function name");
        type_list* s = fdecl->body.function_signature.parameters;
        for(ast_list* r = exp->; r != NULL && s != NULL;
                                                  r=r->next, s=s->next)
           if (!equal_types(s->elem,typecheck(r->elem)))
              error("wrong type of the argument in function call")
        if (r != NULL || s != NULL)
           error("wrong number of parameters");
        return fdecl->body.function_signature.result_type;

where equal_types(x,y) checks the types x and y for equality. We have two types of type equality: type equality based on type name equivalence, or based on structural equivalence. For example, if we have defined T to be a synonym for the type int and have declared the variable x to be of type T, then using the first type of equality, x+1 will cause a type error (since T and int are different names), while using the second equality, it will be correct.

Note also that since most realistic languages support many binary and unary operators, it will be very tedious to hardwire their typechecking into the typechecker using code. Instead, we can use another symbol table to hold all operators (as well as all the system functions) along with their signatures. This also makes the typechecker easy to change.

next up previous contents
Next: 6.3 Case Study: The Up: 6 Semantic Analysis Previous: 6.1 Symbol Tables
Leonidas Fegaras