Meta-Programming in OPTL


An OPTL source program is a C++ program enhanced with a number of syntactic constructs to make the task of the optimizer specification more tractable. The main data structure used by these language constructs is Expr, defined in basic.h. It is a tree-like data structure which is used for representing both algebraic forms and execution plans. For example, the C++ expression

function(variable("join"),
         Cons(variable("R"),
              Cons(variable("S"),
                   Cons(function(variable("eq"),
                                 Cons(variable("A"),Cons(variable("B"),Nil))),
                        Nil))))
constructs the algebraic term join(R,S,eq(A,B)), where Cons and Nil are the list constructors (e.g., Cons(1,Cons(2,Cons(3,Nil))) constructs the list [1,2,3] of type list<int>). To make the task of writing these tree constructions less tedious, OPTL extends C++ with the syntactic form #< >. For example, #<join(R,S,eq(A,B))> is equivalent to the above C++ expression. That is, the text within the brackets #< > is used by OPTL to generate C++ code, which creates the tree-like form (of type Expr) that represents this text. Values of type Expr can be included into the form generated by the #< > brackets by "escaping" them with a backquote character (`). The operand of the escape operator (the backquote operator) is expected to be an expression of type Expr that provides the value to "fill in" the hole in the bracketed text at that point. For example, in

  Expr x = #<join(a,b,p)>;
  Expr y = #<select(`x,q)>;
  Expr z = #<project(`y,A)>;
y is set to #<select(join(a,b,p),q)> and z to #<project(select(join(a,b,p),q),A)>. The BNF of the bracketed syntax is the following:
bracketed ::= "#<" expr ">"
expr      ::=   name                                 the representation of a name
              | integer                              the repr. of an integer
              | string                               the repr. of a string
              | "`" name                             escaping to the value of name
              | "`(" code ")"                        escaping to the value of code
              | expr "(" arg { "," arg } ")"         the repr. of a function
              | expr "(" ")"                         function with no arguments
              | "`" name "[" expr "]"                variable substitution (explained later)
arg       ::=   expr                                 the repr. of an expression
              | "..." name                           escaping to a list of values bound to name
              | "..." "(" code ")"                   escaping to a list of values returned by code
where the curly brackets { } in BNF indicate zero or more repetitions of the symbol sequence inside the curly brackets, and code is any C++ code. The #< `(code) > embeds the value returned by the C++ code code of type Expr to the term representation inside the brackets. For example, #<`f(6,g("ab",`(k(1))),`y)> is equivalent to the following C++ code:
function(f,Cons(function(variable("g"),Cons(string("ab"),Cons(k(1),Nil))),
         Cons(y,Nil)))
If f=#<h>, y=#<m(1,"a")>, and k(1) returns the value #<8>, then the above term is equivalent to #<h(6,g("ab",8),m(1,"a"))>. The three dots (...) construct is used to indicate a list arguments in a function call. Since the arguments of a function call form a value of type list<Expr>, the type of name in ...name is also list<Expr>. For example, in

  list<Expr>* r = Cons(#<join(a,b,p)>,Cons(#<select(c,q)>,Nil));
  Expr z = #<project(...r)>;
z will be bound to #<project(join(a,b,p),select(c,q))>.

OPTL provides a case statement syntax with patterns. Patterns match the Expr representations with similar shape. Escape operators applied to variables inside these patterns represent variable patterns, which "bind" to corresponding subterms upon a successful match. This capability makes it particularly easy to write functions that perform source-to-source transformations. A function that simplifies arithmetic expressions can be expressed easily as:


  Expr simplify ( Expr e ) {
    #case e
    |  plus(`x,0) => return x;
    |  times(`x,1) => return x;
    |  times(`x,0) => return #<0>;
    |  _ => return e;
    #end;
  };
where the _ pattern matches any value. For example, simplify(#<times(z,1)>) returns #<z>, since times(z,1) matches the second case pattern. The BNF for the case statement is:
case    ::= "#case" code { "|" pattern [ guard ] "=>" code } "#end"
pattern ::=   name
            | integer
            | string
            | "_"                                          match everything
            | "`" name                                     pattern variable
            | pattern "(" arg { "," arg } ")"              the repr. of a function
            | pattern "(" ")"                              function with no arguments
            | "`" name "[" pattern "]"                     second-order pattern matching
guard   ::=   ":" code                                     an optional condition
arg     ::=   pattern                                      the repr. of an expression
            | "..." name                                   match the rest of the arguments
where the [ ] brackets indicate an optional sequence of symbols. For example, the pattern `f(...r) matches any function call: when it is matched with #<join(a,b,c)>, it binds f to #<join> and r to the list Cons(#<a>,Cons(#<b>,Cons(#<c>,Nil))). For example, the following function adds the terms #<8> and #<9> as the first parameters to any function e:
  Expr add_arg ( Expr e ) {
    #case e
    |  `f(...r) => return #<`f(8,9,...r)>;
    |  `x => return x;
    #end;
  };
As another example of a case statement in OPTL, consider the following function, which switches the inputs of a binary join found as a parameter to a function e:
  Expr switch_join_args ( Expr e ) {
    #case e
    |  `f(...r,join(`x,`y),...s) => return #<`f(...r,join(`y,`x),...s)>;
    |  `x => return x;
    #end;
  };

The most powerful construct in OPTL is second-order pattern matching, denoted by the special syntax `f[pattern]. When `f[pattern] is matched against an Expr, e, it traverses the entire tree representation of e (in root-left-right order) until it finds a tree node that matches the pattern; when it finds one, say node n, it replaces this node with a fresh variable (i.e. a variable that has not been used before), say v, binding the pattern variables in pattern. Then it continues traversing the rest of the tree, replacing every node equal to n with v. At the end, if the matching succeeds, it binds f to the term bind(v,u), where u is e in which all nodes equal to n have been replaced by v. That way, second-order pattern matching can be used for factorizing common subexpressions. Similarly, a term construction `f[x] expects f to be of the form bind(v,u). It replaces all occurrences of v in u with the Expr x. For example, the OPTL program

  Expr n = new_variable();
  #case e
  |  `f[join(`x,`y)] => return #<let(`n,join(`x,`y),`f[`n])>;
  #end;
extracts the first term that matches a join from the term e and it pulls the term out in a let-binding (n is a user-defined fresh variable).

Next page: Optimizer Specification in OPTL


Last modified: 2/19/97 by Leonidas Fegaras