6.2 Types and Type Checking

Next: 6.3 Case Study: The Up: 6 Semantic Analysis Previous: 6.1 Symbol Tables Contents

6.2 Types and Type Checking

A typechecker is a function that maps an AST that represents an expression into its type. For example, if variable x is an integer, it will map the AST that represents the expression x+1 into the data structure that represents the type int. If there is a type error in the expression, such as in x>"a", then it displays an error message (a type error). So before we describe the typechecker, we need to define the data structures for types. Suppose that we have five kinds of types in the language: integers, booleans, records, arrays, and named types (ie. type names that have been defined earlier by some typedef). Then one possible data structure for types is:

abstract class Type {
}
class IntegerType extends Type {
   public IntegerType () {}
}
class StringType extends Type {
   public StringType () {}
}
class NamedType extends Type {
   public String name;
   public NamedType ( String n ) { value=n; }
}
class ArrayType extends Type {
   public Type element;
   public ArrayType ( Type et ) { element=et; }
}
class RecordComponent {
   public String attribute;
   public Type type;
   public RecordComponent ( String a, Type t ) { attribute=a; type=t; }
}
class RecordType extends Type {
    public List<RecordComponent> elements;
    public RecordType ( List<RecordComponent> el ) { elements=el; }
}

that is, if the type is an integer or a boolean, there are no extra components. If it is a named type, we have the name of the type. If it is an array, we have the type of the array elements (assuming that the size of an array is unbound, otherwise we must include the array bounds). If it is a record, we have a list of attribute/types (the RecordComponents class) to capture the record components.

The symbol table must contain type declarations (ie. typedefs), variable declarations, constant declarations, and function signatures. That is, it should map strings (names) into Declaration objects:

abstract class Declaration {
}
class TypeDeclaration extends Declaration {
   public Type declaration;
   public TypeDeclaration ( Type t ) { declaration=t; }
}
class VariableDeclaration extends Declaration {
   public Type declaration;
   public VariableDeclaration ( Type t ) { declaration=t; }
}
class ConstantDeclaration extends Declaration {
   public Type declaration;
   public Exp value;
   public ConstantDeclaration ( Type t, Exp v ) { declaration=t; value=v; }
}
class FunctionDeclaration extends Declaration {
   public Type result;
   public List<Type> parameters;
   public FunctionDeclaration ( Type t, List<Type> tl ) {
      result = t;
      parameters = tl;
   }
}

If we use the hash table with chaining implementation, the symbol table symbol_table would look like this:

class Symbol {
    public String key;
    public Declaration binding;
    public Symbol next;
    public Symbol ( String k, Declaration v, Symbol r ) {
       key = k;
       binding = v;
       next = r;
    }
}
Symbol[] symbol_table = new Symbol[SIZE];

Recall that the symbol table should support the following operations:

insert ( String key, Declaration binding )
Declaration lookup ( String key )
begin_scope ()
end_scope ()

The typechecking function may have the following signature:

static Type typecheck ( Exp e );

The function typecheck must be recursive since the AST structure is recursive. In fact, this function is a tree traversals that checks each node of the AST tree recursively. The body of the typechecker may look like this:

static Type typecheck ( Exp e ) {
   if (e instanceof IntegerExp)
      return new IntegerType();
   else if (e instanceof StringExp)
      return new StringType();
   else if (e instanceof VariableExp) {
      VariableExp v = (VariableExp) e;
      Declaration decl = lookup(v.name);
      if (decl == null)
         error("undefined variable: "+v.name);
      else if (decl instanceof VariableDeclaration)
         return ((VariableDeclaration) decl).declaration;
      else error(v.name+" is not a variable name");
   } else if (e instanceof BinaryExp) {
      BinaryExp b = (BinaryExp) e;
      Type left = typecheck(b.left);
      Type right = typecheck(b.right);
      switch ( b.operator ) {
         case "+": if (left instanceof IntegerType
                       && right instanceof IntegerType)
                      return new IntegerType();
                   else error("expected integers in addition");
         ...
      }
   } else if (e instanceof CallExp) {
      CallExp c = (CallExp) e;
      Declaration decl = lookup(c.name);
      if (decl == null)
         error("undefined function");
      else if (!(decl instanceof FunctionDeclaration))
         error("this name is not a function name");
      FunctionDeclaration f = (FunctionDeclaration) decl;
      if (f.parameters.size() != c.arguments.size())
         error("wrong number of arguments");
      for ( int i = 0; i < f.parameters.size(); i++ )
          if (!equal_types(f.parameters.get(i),typecheck(c.arguments.get(i))))
             error("wrong type of the argument in function call");
      return f.result;
   }
   else ...
}

where equal_types(x,y) checks the types x and y for equality. We have two types of type equality: type equality based on type name equivalence, or based on structural equivalence. For example, if we have defined T to be a synonym for the type int and have declared the variable x to be of type T, then using the first type of equality, x+1 will cause a type error (since T and int are different names), while using the second equality, it will be correct.

Note also that since most realistic languages support many binary and unary operators, it will be very tedious to hardwire their typechecking into the typechecker using code. Instead, we can use another symbol table to hold all operators (as well as all the system functions) along with their signatures. This also makes the typechecker easy to change.

Next: 6.3 Case Study: The Up: 6 Semantic Analysis Previous: 6.1 Symbol Tables Contents

2015-01-20