OPTL Rule Syntax

A program in OPTL has the following structure in BNF:

[ "%{"
  "%inherited" name { "," name }
  "%synthesized" "{" { name "=" C++type ";" } "}" "=" C++code
  { "%logical" { lrule } }
  "%physical" { prule }
C++code ]
where the curly brackets { } in BNF indicate zero or more repetitions of the symbol sequence inside the curly brackets, the [ ] brackets indicate an optional sequence of symbols, and C++code is any C++ code, which may include OPTL constructs such as #< > brackets. That is, an OPTL program is some C++ code, optionally followed by an optimizer specification. The optimizer specification is between %{ and %}. It consists of the following parts:
  1. The specification of the inherited attributes: this is the line starting with %inherited and followed by the list of synthesized attributes separated by comma. In the SQL optimizer we have only one inherited attribute, namely the expected order:
    %inherited order
  2. The specification of the synthesized attributes: this is the line starting with %synthesized and followed by a C++ tuple type that defines the synthesized attributes along with an initialization value for the attributes. For the SQL optimizer we have three synthesized attributes: the size in blocks of the output relation, the plan cost, and the plan order:
    %synthesized { int size; float cost; Expr order; }
               = { 0, 10000.0, #<order()> }
  3. We may have zero or more modules with logical rules (started with the keyword %logical) and only one module for physical rules (started with the keyword %physical). The optimizer tries the logical rules first in the order they are specified, and then it tries the physical rules.
A logical rule lrule has the following BNF form:
lrule ::=   [ inherited ] pattern [ guard ] "=>" body
          | [ inherited ] pattern [ guard ] "=" body
guard ::= ":" C++code
inherited ::= "{" { name "=" pattern ";" } "}"
body ::=    expr ";"
          | "let" name "=" C++code { "," name "=" C++code } "in" body
          | "#case" C++code { "|" pattern "=>" body } "#end" ";"
          | "#forall" name "in" C++code "do" body "#end" ";"
#case is like the OPTL case statement which pattern-matches Expr terms; let binds variables of type Expr to Expr values; and #forall defines a loop where a variable of type Expr iterates over a list of type list<Expr>. If there is no #foreach loop, then a logical rule generates just one algebraic term; otherwise it generates as many as the number of iterations in the loop. Each rule has an optional guard expressed in C++ code. An attribute list { attr=pattern, ... } before the head of a rule indicates that each inherited attribute attr, passed along with the term being transformed, must match the pattern in the attribute list. The names of these attributes must be inherited attribute names, but they can be in any order and some of them may even be left unspecified. In the latter case the inherited attribute can be of any value. (It is ignored and passed as is to the next rewriting). If these patterns are complex patterns, they serve as guards to the rule. The children of the node matching the head of the rule are the subterms bound to the free variables of the head pattern. After we provide the inherited attributes for the term being transformed (that matches the head of the rule), we may need to pass them to its children (to the subterms), possibly with modified values. For example, if the head is f(`x,`y) then x and y will be bound to the left and right subterm of the matching term and they will be both transformed before the rule is applied. This recursive transformation of the subterms is a bottom-up rewriting, since subterms are transformed before the matching term. New inherited attributes are propagated to the subterm x by specifying `x<-{ attr=expr,... }, where attr is an inherited attribute name and expr is an OPTL expression that depends on the variables in the patterns before the rule head. Recall that inherited attributes can be in any order and can be selectively omitted. In the latter case the inherited values passed to the children of a term are exactly the same as the values provided before the rewriting of this term.

There are two types of rules: reductions (indicated by => in the rule) and rewrites (indicated by = in the rule). If there are many rules applicable to a term, then the first reduction rule is applied and the rest rules are ignored; if there is no applicable reduction rule, then all applicable rewrite rules are applied to this term and the results of all rules are accumulated. This process repeats for a term and its subterms until no rule is applicable. The rewrite rule type must be selected with caution, as it may yield an inefficient search. In addition, there is a danger of falling into an infinite loop, even though no rule can be applied twice to the same term. One example of such a case is the rule `x = f(`x), which generates the infinite sequence f(a), f(f(a)), f(f(f(a))), ... for some term a. For example, the following reduction combines two selects into one select. The predicate of the new select is the conjunction of the two selection predicates (derived by the support function combine_predicates):

  => let pred = combine_predicates(p1,p2)
     in select(`e,`pred);
A physical rule prule has the following BNF form:
prule ::=   [ inherited ] pattern [ guard ] "=" body
body ::=    expr [ synthesized ] ";"
          | "let" name "=" C++code { "," name "=" C++code } "in" body
          | "#case" C++code { "|" pattern "=>" body } "#end" ";"
          | "#forall" name "in" C++code "do" body "#end" ";"
synthesized ::= ":"  "{" { name "=" C++code ";" } "}"
The only difference from the logical rules is the computation of synthesized attributes after the rule body. Once a physical rule is applied, the derived term is a final plan and it is not processed any more. (In contrast, logical rules are applied until no rule is applicable.)

There is also support for memoization. When a term x, associated with inherited attributes IA, is transformed into a list of expressions [x1,...,xn] (these are physical plans because the rewrite is complete) with their synthesized attributes [SA1,...,SAn], then the tuple (x,IA,[(x1,SA1),...,(xn,SAn)]) is stored in a system-provided memoization table (a hash table). Therefore, rewriting a term x with IA involves searching this table to test if x and IA are already memoized. If they are, the resulting rewrite is retrieved from the table; if not, the result is calculated by applying the rewrite rules.

As an example of a physical rule, consider the following:

{ order=order(); }
  = #forall r in all_table_indexes(table,name) do
       #case r
       |  index(`idx,order(...io))
	     => INDEX_SCAN(`table,`name,`pred,`idx,order(...io))
	        : { size = int(table_cardinality(table)*selectivity(pred));
	            cost = index_access_cost(table,pred,io);
		    order = #<order(...io)>; };
This rule retrieves all indexes (applicable or not) using the support function all_table_indexes and iterates over them. The expected order must be empty in order for this rule to apply (first line of the rule). For each index r, this rule generates an INDEX_SCAN plan. The case statement in the rule body checks whether r is a real index. Attached to each plan, there is the computation of the new synthesized attributes (e.g., the order of the index scan is the index key).

For each OPTL specification file, OPTGEN generates an optimizer. The interface of this optimizer is the function rewrite:

list<plan*>* rewrite ( Expr e, inherited ia, plan* (best_plan)(list<plan*>*), bool generate_plans );
where e is the query to be optimized, ia is the initial inherited attributes (the type inherited is generated by OPTGEN from the inherited attributes specification), generate_plans is a flag: if it is true both logical and physical rewrites are performed, if it is false only logical rewrites are performed. best_plan is a function that selects the best plan from a list of plans (to be specified by the user). The result is a list of plans. A plan is defined as follows:
class plan {
  Expr expression;
  synthesized attributes;
that is, it is a pair of a term and an inherited attribute.

Last modified: 2/19/97 by Leonidas Fegaras