Let's consider now how actions are evaluated by different parsers. In recursive descent parsers, actions are pieces of code embedded in the recursive procedures. Table-driven predictive parsers use the parse stack to push/pop actions (along with symbols) but they use a separate semantic stack to execute the actions. In that case, the parsing algorithm becomes:
push(S);
read_next_token();
repeat
X = pop();
if (X is a terminal or '$')
if (X == current_token)
read_next_token();
else error();
else if (X is an action)
perform the action;
else if (M[X,current_token] == "X ::= Y1 Y2 ... Yk") /* some Yi may be actions */
{ push(Yk);
...
push(Y1);
}
else error();
until X == '$';
For example, suppose that pushV and popV are the
functions to manipulate the semantic stack. The following is the grammar of
an interpreter that uses the semantic stack to perform additions and
subtractions:
E ::= T E' $ { print(popV()); }
E' ::= + T { pushV(popV() + popV()); } E'
| - T { pushV(popV() - popV()); } E'
|
T ::= num { pushV(num); }
For example, for 1+5-2, we have the following sequence of actions:
pushV(1); pushV(5); pushV(popV()+popV()); pushV(3); pushV(popV()-popV()); print(popV());.
Question: what would happen if we put the action of the second rule at the end of rhs?
In contrast to top-down parsers, bottom-up parsers can only perform an action after a reduction (ie, after the entire rhs of a rule has been processed). Why? because at each point of time we may have a potential for multiple rules for reduction (this is capured by the itemsets), which means that we may be in the middle of many rules at a time, but later only one rule will actually be used; so, we can't execute an action in the middle of a rule because we may have to undo it later if the rule is not used. This means that we can only have rules of the form
E ::= E + E { $$ = $1 + $3; }
where the action is always at the end of the rhs. This action is
executed after the rule E ::= E + E is reduced. The parser does
not push actions into the parse stack but it uses the parse stack to
push/pop values instead. For example, here when an E is found we push
the value of E (a number) into the stack. Now all the
nonterminals in the rhs of a rule have a value in a stack at the time
this rule is reduced. For example, when E ::= E + E is reduced,
both the first E and the second E in the rhs of this
rule have values in the stack. To access the value of the nth
symbol from the stack, we use $n. The result that will be
pushed implicitly in the stack is $$. So in this example, we
retrieve the value of the first E as $1, the value of
the second E as $3, and we add them together to form the
result. Note that the stack is not popped to retrieve values; values
are accessed directly from the stack by using offsets; eg, $3
is translated into an offset 3 from the top of the stack so the
retrieved value is STACK[size-3].
What if we want to put an action in the middle of the rhs of a rule in a bottom-up parser? In that case we use a dummy nonterminal, called a marker. For example,
X ::= a { action } b
is equivalent to
X ::= M b
M ::= a { action }
This is done automatically in Bison and Yacc (ie, we can actually
put actions in the middle of a rhs of a rule and Bison will use
the above trick to put it at the end of a rule). There is a danger
though that the resulting rules may introduce new shift- or
reduce-reduce conflicts.
If we want build an AST in yacc or bison, we need to associate each
non-terminal symbol with an AST type. For example, if we use the non-terminals
E and EL for expressions and list of expressions respectively, we can
define their types as follows:
%union { ast* Tast;
ast_list* Tast_list;
}
%type <Tast> E
%type <Tast_list> EL
Then the production rules should have actions to build ASTs:
E : E '+' E { $$ = make_binary_op("+",$1,$3); }
| E '-' E { $$ = make_binary_op("-",$1,$3); }
| id '(' EL ') { $$ = make_call($1,$3); }
| num { $$ = make_integer(atoi(yytext)); }
;
EL : EL ',' E { $$ = make_exp_list($3,$1); }
| E { $$ = make_exp_list($1,NULL); }
;
That is, for integer addition, we build an AST node that corresponds to
a binary operation (see the AST in Section 4),
whose left and right subtrees come from the ASTs of the two Es
at the rhs of the rule E: E'+' E.