next up previous contents
Next: 3.3.2 SLR(1), LR(1), and Up: 3.3 Bottom-up Parsing Previous: 3.3 Bottom-up Parsing

3.3.1 LR(0) Grammar

Recall the original CFG G1:

0)  S :: = E $
1)  E ::= E + T
2)      | E - T
3)      | T
4)  T ::= T * F
5)      | T / F
6)      | F
7)  F ::= num
8)      | id
Here is a trace of the bottom-up parsing of the input x-2*y$:
      Stack          rest-of-the-input          Action
---------------------------------------------------------------
 1)                    id - num * id $      shift
 2)   id               - num * id $         reduce by rule 8
 3)   F                - num * id $         reduce by rule 6
 4)   T                - num * id $         reduce by rule 3
 5)   E                - num * id $         shift
 6)   E -              num * id $           shift
 7)   E - num          * id $               reduce by rule 7
 8)   E - F            * id $               reduce by rule 6
 9)   E - T            * id $               shift
10)   E - T *          id $                 shift
11)   E - T * id       $                    reduce by rule 8
12)   E - T * F        $                    reduce by rule 4
13)   E - T            $                    reduce by rule 2
14)   E $                                   shift
15)   S                                     accept (reduce by rule 0)
We start with an empty stack and we finish when the stack contains the non-terminal S only. Every time we shift, we push the current token into the stack and read the next token from the input sequence. When we reduce by a rule, the top symbols of the stack (the handle) must match the rhs of the rule exactly. Then we replace these top symbols with the lhs of the rule (a non-terminal). For example, in the line 12 above, we reduce by rule 4 (T ::= T * F). Thus we pop the symbols F, *, and T (in that order) and we push T. At each instance, the stack (from bottom to top) concatenated with the rest of the input (the unread input) forms a valid bottom-up derivation from the input sequence. That is, the derivation sequence here is:
 1)       id - num * id $
 3)  =>   F - num * id $
 4)  =>   T - num * id $
 5)  =>   E - num * id $
 8)  =>   E - F * id $
 9)  =>   E - T * id $
12)  =>   E - T * F $
13)  =>   E - T $
14)  =>   E $
15)  =>   S
The purpose of the above example is to understand how bottom-up parsing works. It involves lots of guessing. We will see that we don't actually push symbols in the stack. Rather we push states that represent possibly more than one potentials for rule reductions.

As we learned in earlier sections, a DFA can be represented by transition tables. In this case we use two tables: ACTION and GOTO. ACTION is used for recognizing which action to perform and, if it is shifting, which state to go next. GOTO indicates which state to go next for a nonterminal symbol. We will later learn how to construct these tables.

Consider the following grammar G4:

1)  S ::= E $
2)  E ::= E + T
3)      | T
4)  T ::= id
5)     | ( E )

The DFA that recognizes the handles of this grammar is: (this looks like magic now, but will be explained later how it is constructed)

lr1.gif

where r2 for example means reduce by rule 2 (E ::= E + T). The ACTION and GOTO tables that correspond to this DFA are:

state action goto
  id ( ) + $ S E T
0 s5 s6         1 9
1       s3 s2      
2 a a a a a      
3 s5 s6           4
4 r2 r2 r2 r2 r2      
5 r4 r4 r4 r4 r4      
6 s5 s6         7 9
7     s8 s3        
8 r5 r5 r5 r5 r5      
9 r3 r3 r3 r3 r3      

where for example s5 means shift a token into the stack and go to state 5.

Here is the shift-reduce parser:

push(0);
read_next_token();
for(;;)
{  s = top();
   if (ACTION[s,current_token] == 'si')   /* shift and go to state i */
   {  push(i);
      read_next_token();
   }
   else if (ACTION[s,current_token] == 'ri')   /* reduce by rule i: X ::= A1...An */
   {  perform pop() n times;
      s = top();
      push(GOTO[s,X]);
   }
   else if (ACTION[s,current_token] == 'a')
      success!!
   else error();
}
Note that the stack contains state numbers only, not symbols.

Now the only thing that remains to do is, given a CFG, to construct the finite automaton (DFA) that recognizes handles. After that, constructing the ACTION and GOTO tables will be straightforward.

The states of the finite state machine correspond to item sets. An item (or configuration) is a production with a dot (.) at the rhs that indicates how far we have progressed using this rule to parse the input. For example, the item E ::= E + . E indicates that we are using the rule E ::= E + E and that, using this rule, we have parsed E, we have seen a token +, and we are ready to parse another E. Now, why do we need a set (an item set) for each state in the state machine? Because many production rules may be applicable at this point; later when we will scan more input tokens we will be able tell exactly which production to use. This is the time when we are ready to reduce by the chosen production.

For example, say that we are in a state that corresponds to the item set with the following items:

S ::= id . := E
S ::= id . : S
This state indicates that we have already parsed an id from the input but we have 2 possibilities: if the next token is := we will use the first rule and if it is : we will use the second.

Now we are ready to construct our automaton. Since we do not want to work with NFAs, we build a DFA directly. So it is important to consider closures (like we did when we transformed NFAs to DFAs). The closure of an item X ::= a . t b (ie, the dot appears before a terminal t) is the singleton set that contains the item X ::= a . t b only. The closure of an item X ::= a . Y b (ie, the dot appears before a nonterminal Y) is the set consisting of the item itself, plus all productions for Y with the dot at the left of the rhs, plus the closures of these items. For example, the closure of the item E ::= E + . T is the set:

E ::= E + . T
T ::= . T * F
T ::= . T / F
T ::= . F
F ::= . num
F ::= . id

The initial state of the DFA (state 0) is the closure of the item S ::= . E $, where E is the start symbol. In few words, if there is an item X ::= a . s b in an item set, where s is a symbol (terminal or nonterminal), we have a transition labelled by s to an item set that contains X ::= a s . b. But it's a little bit more complex than that:

  1. If we have more than one items with a dot before the same symbol s, say X ::= a . s b and Y ::= c . s d, then the new item set contains both X ::= a s . b and Y ::= c s . d.
  2. We need to get the closure of the new item set.
  3. We have to check if this item set has been appeared before so that we don't create it again.
For example, the CFG:

1)  S ::= E $
2)  E ::= E + T
3)      | T
4)  T ::= id
5)     | ( E )

has the following item sets:

I0:  S ::= . E $                  I4:  E ::= E + T .
     E ::= . E + T
     E ::= . T                    I5:  T ::= id .
     T ::= . id
     T ::= . ( E )                I6:  T ::= ( . E )
                                       E ::= . E + T
I1:  S ::= E . $                       E ::= . T
     E ::= E . + T                     T ::= . id
                                       T ::= . ( E )
I2:  S ::= E $ .
                                  I7:  T ::= ( E . )
I3:  E ::= E + . T                     E ::= E . + T
     T ::= . id
     T ::= . ( E )                I8:  T ::= ( E ) .

                                  I9:  E ::= T .

that correspond to the DFA:

lr1.gif

Explanation: The initial state I0 is the closure of S ::= . E $. The two items S ::= . E $ and E ::= . E + T in I0 must have a transition by E since the dot appears before E. So we have a new item set I1 and a transition from I0 to I1 by E. The I1 item set must contain the closure of the items S ::= . E $ and E ::= . E + T with the dot moved one place right. Since the dot in I1 appears before terminals ($ and +), the closure is the items themselves. Similarly, we have a transition from I0 to I9 by T, to I5 by id, to I6 by '(', etc, and we do the same thing for all item sets.

Now if we have an item set with only one item S ::= E ., where S is the start symbol, then this state is the accepting state (state I2 in the example). If we have an item set with only one item X ::= a . (where the dot appears at the end of the rhs), then this state corresponds to a reduction by the production X ::= a. If we have a transition by a terminal symbol (such as from I0 to I5 by id), then this corresponds to a shifting.

As another example, consider the following augmented grammar:

0) S' ::= S $
1) S  ::= B B
2) B  ::= a B
3) B  ::= c

The state diagram is:

exam13.gif

The ACTION and GOTO parsing tables are:

state action goto
  a c $ S' S B
0 s5 s7     1 3
1     s2      
2 a a a      
3 s5 s7       4
4 r1 r1 r1      
5 s5 s7       6
6 r2 r2 r2      
7 r3 r3 r3      

As yet another example, the grammar:

S ::= E $
E ::= ( L )
E ::= ( )
E ::= id
L ::= L , E
L ::= E
has the following state diagram:

lr0.gif

If we have an item set with more than one items with a dot at the end of their rhs, then this is an error, called a reduce-reduce conflict, since we can not choose which rule to use to reduce. This is a very important error and it should never happen. Another conflict is a shift-reduce conflict where we have one reduction and one or more shifts at the same item set. For example, when we find b in a+b+c, do we reduce a+b or do we shift the plus token and reduce b+c later? This type of ambiguities is usually resolved by assigning the proper precedences to operators (described later). The shift-reduce conflict is not as severe as the reduce-reduce conflict: a parser generator selects reduction against shifting but we may get a wrong behavior. If the grammar has no reduce-reduce and no shift-reduce conflicts, it is LR(0) (ie. we read left-to-right, we use right-most derivations, and we don't look ahead any tokens).


next up previous contents
Next: 3.3.2 SLR(1), LR(1), and Up: 3.3 Bottom-up Parsing Previous: 3.3 Bottom-up Parsing
Leonidas Fegaras
2002-08-26