next up previous contents
Next: 6 Semantic Analysis Up: CSE 5317/4305: Design and Previous: 4.2 Case Study: Abstract

5 Semantic Actions

Let's consider now how actions are evaluated by different parsers. In recursive descent parsers, actions are pieces of code embedded in the recursive procedures. For the following grammar:

E ::= T E'
E' ::= + T E'
     | - T E'
T ::= num

we have the following recursive descent parser:

int E () { return E'(T()); };
int E' ( int left ) {
  if (current_token=='+') {
     return E'(left + T());
  } else if (current_token=='-') {
     return E'(left - T());
  } else return left;
int T () {
  if (current_token=='num') {
     return num_value;
  } else error();

By passing T() as input to E', we pass the left operand to E'.

Table-driven predictive parsers use the parse stack to push/pop actions (along with symbols) but they use a separate semantic stack to execute the actions. In that case, the parsing algorithm becomes:

  X = pop();
  if (X is a terminal or '$')
     if (X == current_token)
     else error();
  else if (X is an action)
     perform the action;
  else if (M[X,current_token] == "X ::= Y1 Y2 ... Yk")   /* some Yi may be actions */
     {  push(Yk);
  else error();
until X == '$';

For example, suppose that pushV and popV are the functions to manipulate the semantic stack. The following is the grammar of an interpreter that uses the semantic stack to perform additions and subtractions:

E ::= T E' $ { print(popV()); }
E' ::= + T { pushV(popV() + popV()); } E'
     | - T { pushV(-popV() + popV()); } E'
T ::= num { pushV(num); }
For example, for 1+5-2, we have the following sequence of actions: pushV(1); pushV(5); pushV(popV()+popV()); pushV(3); pushV(-popV()+popV()); print(popV());. Question: what would happen if we put the action of the second rule at the end of rhs?

In contrast to top-down parsers, bottom-up parsers can only perform an action after a reduction (ie, after the entire rhs of a rule has been processed). Why? because at each point of time we may have a potential for multiple rules for reduction (this is capured by the itemsets), which means that we may be in the middle of many rules at a time, but later only one rule will actually be used; so, we can't execute an action in the middle of a rule because we may have to undo it later if the rule is not used. This means that we can only have rules of the form

E ::= E + E { $$ = $1 + $3; }
where the action is always at the end of the rhs. This action is executed after the rule E ::= E + E is reduced. The parser does not push actions into the parse stack but it uses the parse stack to push/pop values instead. For example, here when an E is found we push the value of E (a number) into the stack. Now all the nonterminals in the rhs of a rule have a value in a stack at the time this rule is reduced. For example, when E ::= E + E is reduced, both the first E and the second E in the rhs of this rule have values in the stack. To access the value of the nth symbol from the stack, we use $n. The result that will be pushed implicitly in the stack is $$. So in this example, we retrieve the value of the first E as $1, the value of the second E as $3, and we add them together to form the result. Note that the stack is not popped to retrieve values; values are accessed directly from the stack by using offsets; eg, $3 is translated into an offset 3 from the top of the stack so the retrieved value is STACK[size-3].

What if we want to put an action in the middle of the rhs of a rule in a bottom-up parser? In that case we use a dummy nonterminal, called a marker. For example,

X ::= a { action } b
is equivalent to
X ::= M b
M ::= a { action }
This is done automatically in Bison and Yacc (ie, we can actually put actions in the middle of a rhs of a rule and Bison will use the above trick to put it at the end of a rule). There is a danger though that the resulting rules may introduce new shift- or reduce-reduce conflicts.

If we want build an AST in yacc or bison, we need to associate each non-terminal symbol with an AST type. For example, if we use the non-terminals E and EL for expressions and list of expressions respectively, we can define their types as follows:

%union { ast*      Tast;
         ast_list* Tast_list;
%type <Tast> E
%type <Tast_list> EL
Then the production rules should have actions to build ASTs:
E : E '+' E      { $$ = make_binary_op("+",$1,$3); }
  | E '-' E      { $$ = make_binary_op("-",$1,$3); }
  | id '(' EL ') { $$ = make_call($1,$3); }
  | num          { $$ = make_integer(atoi(yytext)); }
EL : EL ',' E    { $$ = make_exp_list($3,$1); }
   | E           { $$ = make_exp_list($1,NULL); }
That is, for integer addition, we build an AST node that corresponds to a binary operation (see the AST in Section 4), whose left and right subtrees come from the ASTs of the two Es at the rhs of the rule E: E'+' E.

next up previous contents
Next: 6 Semantic Analysis Up: CSE 5317/4305: Design and Previous: 4.2 Case Study: Abstract
Leonidas Fegaras