    Next: 6 Semantic Analysis Up: CSE 5317/4305: Design and Previous: 4.2 Case Study: Abstract

# 5 Semantic Actions

Let's consider now how actions are evaluated by different parsers. In recursive descent parsers, actions are pieces of code embedded in the recursive procedures. For the following grammar:

```E ::= T E'
E' ::= + T E'
| - T E'
|
T ::= num
```

we have the following recursive descent parser:

```int E () { return E'(T()); };
int E' ( int left ) {
if (current_token=='+') {
return E'(left + T());
} else if (current_token=='-') {
return E'(left - T());
} else return left;
};
int T () {
if (current_token=='num') {
return num_value;
} else error();
};
```

By passing `T()` as input to `E'`, we pass the left operand to `E'`.

Table-driven predictive parsers use the parse stack to push/pop actions (along with symbols) but they use a separate semantic stack to execute the actions. In that case, the parsing algorithm becomes:

```push(S);
repeat
X = pop();
if (X is a terminal or '\$')
if (X == current_token)
else error();
else if (X is an action)
perform the action;
else if (M[X,current_token] == "X ::= Y1 Y2 ... Yk")   /* some Yi may be actions */
{  push(Yk);
...
push(Y1);
}
else error();
until X == '\$';
```

For example, suppose that `pushV` and `popV` are the functions to manipulate the semantic stack. The following is the grammar of an interpreter that uses the semantic stack to perform additions and subtractions:

```E ::= T E' \$ { print(popV()); }
E' ::= + T { pushV(popV() + popV()); } E'
| - T { pushV(-popV() + popV()); } E'
|
T ::= num { pushV(num); }
```
For example, for `1+5-2`, we have the following sequence of actions: `pushV(1); pushV(5); pushV(popV()+popV()); pushV(3); pushV(-popV()+popV()); print(popV());`. Question: what would happen if we put the action of the second rule at the end of rhs?

In contrast to top-down parsers, bottom-up parsers can only perform an action after a reduction (ie, after the entire rhs of a rule has been processed). Why? because at each point of time we may have a potential for multiple rules for reduction (this is capured by the itemsets), which means that we may be in the middle of many rules at a time, but later only one rule will actually be used; so, we can't execute an action in the middle of a rule because we may have to undo it later if the rule is not used. This means that we can only have rules of the form

```E ::= E + E { \$\$ = \$1 + \$3; }
```
where the action is always at the end of the rhs. This action is executed after the rule `E ::= E + E` is reduced. The parser does not push actions into the parse stack but it uses the parse stack to push/pop values instead. For example, here when an `E` is found we push the value of `E` (a number) into the stack. Now all the nonterminals in the rhs of a rule have a value in a stack at the time this rule is reduced. For example, when `E ::= E + E` is reduced, both the first `E` and the second `E` in the rhs of this rule have values in the stack. To access the value of the nth symbol from the stack, we use `\$n`. The result that will be pushed implicitly in the stack is `\$\$`. So in this example, we retrieve the value of the first `E` as `\$1`, the value of the second `E` as `\$3`, and we add them together to form the result. Note that the stack is not popped to retrieve values; values are accessed directly from the stack by using offsets; eg, `\$3` is translated into an offset `3` from the top of the stack so the retrieved value is `STACK[size-3]`.

What if we want to put an action in the middle of the rhs of a rule in a bottom-up parser? In that case we use a dummy nonterminal, called a marker. For example,

```X ::= a { action } b
```
is equivalent to
```X ::= M b
M ::= a { action }
```
This is done automatically in Bison and Yacc (ie, we can actually put actions in the middle of a rhs of a rule and Bison will use the above trick to put it at the end of a rule). There is a danger though that the resulting rules may introduce new shift- or reduce-reduce conflicts.

If we want build an AST in yacc or bison, we need to associate each non-terminal symbol with an AST type. For example, if we use the non-terminals `E` and `EL` for expressions and list of expressions respectively, we can define their types as follows:

```%union { ast*      Tast;
ast_list* Tast_list;
}
%type <Tast> E
%type <Tast_list> EL
```
Then the production rules should have actions to build ASTs:
```E : E '+' E      { \$\$ = make_binary_op("+",\$1,\$3); }
| E '-' E      { \$\$ = make_binary_op("-",\$1,\$3); }
| id '(' EL ') { \$\$ = make_call(\$1,\$3); }
| num          { \$\$ = make_integer(atoi(yytext)); }
;
EL : EL ',' E    { \$\$ = make_exp_list(\$3,\$1); }
| E           { \$\$ = make_exp_list(\$1,NULL); }
;
```
That is, for integer addition, we build an AST node that corresponds to a binary operation (see the AST in Section 4), whose left and right subtrees come from the ASTs of the two Es at the rhs of the rule `E: E'+' E`.    Next: 6 Semantic Analysis Up: CSE 5317/4305: Design and Previous: 4.2 Case Study: Abstract
Leonidas Fegaras
2002-08-26