Optimizer Specification in OPTL

Query optimizers in OPTL are specified as term-rewriting systems expressed by a set of rewrite rules. Each rewrite rule describes a transformation from a logical algebra expression into either a logical algebra expression or a physical plan. Both logical expressions and physical plans are manipulated as term representations. For example, the logical expression:

select address from employees where dept="CSE" and salary > 50000

can be represented in OPTL as:

#<sql_select(address,employees,and(eq(dept,"CSE"),gt(salary,50000)))#>

that is, as an instance of the OPTL type Expr. In this way, program representations that represent logical expressions or physical plans take a form that reflects the meaning of these expressions. For example, it is more convenient to write join(`x,`y,`p) to represent a join than to use explicit value constructors to build the data structure that represents this expression.

The OPTL rule-based term-rewriting system is based on attribute grammars. An attribute list is a binding from a fixed set of attribute names to values. Each optimizer expressed in our specification language will have two attribute lists: inherited and synthesized. The inherited attributes provide context during term-rewriting. Suppose that we are currently rewriting a term t. When a subterm s of t is chosen to be transformed, new inherited attribute values for s need to be computed. These new attributes values typically are derived from the inherited attribute values passed to term t before it was transformed. Therefore, inherited attribute values are propagated from one rewriting to the next. This scheme results in a purely functional term-rewriting system, because the rewriting context is passed only in the form of attributes, just as functions use parameters instead of global variables to pass information. By passing the rewriting context in a form of inherited attributes, rule-based optimizer specifications are easier to understand. Changes to context can be made locally to each rule, by specifying how the attribute values are propagated from their values before the application of the rule (that is, before this current rewriting) into the subterms chosen to be transformed next (that is, to the subsequent rewritings). In addition, memoization of the completed rewrites becomes possible, since each rewrite depends on the term to be transformed and on the inherited attribute values only.

Examples of inherited attributes include logical properties that need to be propagated and physical properties that need to be enforced. This scheme is consistent with other rule-based systems in the literature. One example of logical properties is schema information, such as relation names associated with their column names. Examples of physical properties include ordering of tuples in a relational expression, the site where objects reside in distributed databases, presence or absence of objects in memory, and the set of available access paths. Other use of inherited attributes is to control which rules can be considered during a rewrite. They can also be used for passing heuristic information to the main rewrite engine so that it can select more effectively which rules to apply next.

The synthesized attributes are attributes whose values are computed when a rewrite is complete. They form the physical properties of the produced plans. They are passed bottom-up from the leaves of a completely transformed term (that is, the leaves of a physical plan that evaluates a query) to the root of this term. Examples include the order of a stream of tuples produced by a physical plan, the selectivity estimation of join predicates, the cardinality of a stream of tuples, and the cost of a plan.

As an example of how inherited attributes are propagated and how synthesized attributes are accumulated consider the following rule taken from the domain of relational query optimization:

{ order=`expected_order; }
join(`x<-{ order=`(required_order(pred,all_tables(x),all_tables(y))); },
     `y<-{ order=`(required_order(pred,all_tables(y),all_tables(x))); },
     `pred)
   : equijoinp(pred,all_tables(x),all_tables(y))
     && subsumes(expected_order,^x.order)
   = MERGE_JOIN(`x,`y,`pred,
		`(required_order(pred,all_tables(x),all_tables(y))),
		`(required_order(pred,all_tables(y),all_tables(x))))
        : { size = int((^x.size+^y.size)*selectivity(pred));
	    cost = ^x.cost+^y.cost+(^x.size+^y.size);
	    order = ^x.order; };

This rule translates a join of the relational algebra into a MERGE_JOIN evaluation plan. The identifier order is an inherited attribute that represents the required order of the stream of tuples generated by a term. The pattern variable expected_order is bound to the current value of the attribute order before the transformation (first line of the rule). The expected order propagated to the inner and outer relations is computed by the support C++ function required_order, which returns the columns of the outer and inner tables that participate in the join predicate. These columns are also incorporated in the code of the MERGE_JOIN to help the execution of this plan. Function call all_tables(x) returns the names of all relations referred by the logical expression x. Note that when a logical expression matches the head of the rule, the pattern variables x, y, and p are bound to the subterms of this logical expression. These subterms are then optimized by the same rewriting process. The rule body constructs a new term by replacing the variables in the rule body with the new optimized terms associated with these variables. The C++ code between : and = (after the rule head and before the rule result) is the guard of this rule: the rule is executed only when this guard evaluates to true. The notation ^x.order in the rule body returns the value of the synthesized attribute order computed after the completion of the transformation of x (that is, this is the real order of the physical plan assigned to x). If the join is an equijoin (tested by the support function equijoinp) and the expected order of join subsumes the real order of the outer relation, then the rule returns a MERGE_JOIN. Otherwise it returns no plan (the rule is rejected). That is, this rule forces a sort-order to both join inputs before it applies the merge-sort join. The values enclosed by curly brackets after the second colon are the synthesized values for this plan. For example, the plan cost is the sum of the input costs plus the sizes of the inputs.

The expected order of a plan (the inherited attribute order) must subsume the real order of the plan (the value of the synthesized attribute order). This is the case when the plan is an index scan that delivers tuples in the index order. If this is not the case, then the expected order must be enforced by using the SORT plan:

{ order=order(`o...os); }
(`x<-{ order=order(); })
  = SORT(`x,order(`o...os));

This rule says that if the expected order is not empty (empty order is order()), then any plan x should be sorted over the expected order. At the same time when we rewrite x we pass the empty order order() as an expected order for x so that this rule applies only once. For example, if we optimize the term #<join(R,S,eq(project(R,A),project(S,B)))> using the above rules and we don't have any available indexes for R and S, we get:

#<MERGE_JOIN(SORT(R,order(project(R,A))),
             SORT(S,order(project(S,B))),
             eq(project(R,A),project(S,B)),
             order(project(R,A)),
             order(project(S,B)))>

where all order(...) were computed by the required_order function.

Next page: optl rule syntax

last modified: 2/19/97 by leonidas fegaras