next up previous contents
Next: 3.3.4 SLR(1), LR(1), and Up: 3.3 Bottom-up Parsing Previous: 3.3.2 Shift-Reduce Parsing Using   Contents

3.3.3 Table Construction

Now the only thing that remains to do is, given a CFG, to construct the finite automaton (DFA) that recognizes handles. After that, constructing the ACTION and GOTO tables will be straightforward.

The states of the finite state machine correspond to item sets. An item (or configuration) is a production with a dot (.) at the rhs that indicates how far we have progressed using this rule to parse the input. For example, the item E ::= E + . E indicates that we are using the rule E ::= E + E and that, using this rule, we have parsed E, we have seen a token +, and we are ready to parse another E. Now, why do we need a set (an item set) for each state in the state machine? Because many production rules may be applicable at this point; later when we will scan more input tokens we will be able tell exactly which production to use. This is the time when we are ready to reduce by the chosen production.

For example, say that we are in a state that corresponds to the item set with the following items:

S ::= id . := E
S ::= id . : S
This state indicates that we have already parsed an id from the input but we have 2 possibilities: if the next token is := we will use the first rule and if it is : we will use the second.

Now we are ready to construct our automaton. Since we do not want to work with NFAs, we build a DFA directly. So it is important to consider closures (like we did when we transformed NFAs to DFAs). The closure of an item X ::= a . t b (ie, the dot appears before a terminal t) is the singleton set that contains the item X ::= a . t b only. The closure of an item X ::= a . Y b (ie, the dot appears before a nonterminal Y) is the set consisting of the item itself, plus all productions for Y with the dot at the left of the rhs, plus the closures of these items. For example, the closure of the item E ::= E + . T is the set:

E ::= E + . T
T ::= . T * F
T ::= . T / F
T ::= . F
F ::= . num
F ::= . id

The initial state of the DFA (state 0) is the closure of the item S ::= . a $, where S ::= a $ is the first rule. In simple words, if there is an item X ::= a . s b in an item set, where s is a symbol (terminal or nonterminal), we have a transition labelled by s to an item set that contains X ::= a s . b. But it's a little bit more complex than that:

  1. If we have more than one item with a dot before the same symbol s, say X ::= a . s b and Y ::= c . s d, then the new item set contains both X ::= a s . b and Y ::= c s . d.
  2. We need to get the closure of the new item set.
  3. We have to check if this item set has been appeared before so that we don't create it again.

For example, our previous grammar which parses the R.E. ab*$:

0) S ::= R $
1) R ::= R b
2) R ::= a
has the following item sets:

lr4.gif

which forms the following DFA:

lr3.gif

We can understand now the R transition: If we read an entire R term (that is, after we reduce by a rule that parses R), and the previous state before the reduction was 0, we jump to state 1.

As another example, the CFG:

1)  S ::= E $
2)  E ::= E + T
3)      | T
4)  T ::= id
5)     | ( E )
has the following item sets:

I0:  S ::= . E $                  I4:  E ::= E + T .
     E ::= . E + T
     E ::= . T                    I5:  T ::= id .
     T ::= . id
     T ::= . ( E )                I6:  T ::= ( . E )
                                       E ::= . E + T
I1:  S ::= E . $                       E ::= . T
     E ::= E . + T                     T ::= . id
                                       T ::= . ( E )
I2:  S ::= E $ .
                                  I7:  T ::= ( E . )
I3:  E ::= E + . T                     E ::= E . + T
     T ::= . id
     T ::= . ( E )                I8:  T ::= ( E ) .

                                  I9:  E ::= T .
that correspond to the DFA:

lr1.gif

Explanation: The initial state I0 is the closure of S ::= . E $. The two items S ::= . E $ and E ::= . E + T in I0 must have a transition by E since the dot appears before E. So we have a new item set I1 and a transition from I0 to I1 by E. The I1 item set must contain the closure of the items S ::= . E $ and E ::= . E + T with the dot moved one place right. Since the dot in I1 appears before terminals ($ and +), the closure is the items themselves. Similarly, we have a transition from I0 to I9 by T, to I5 by id, to I6 by '(', etc, and we do the same thing for all item sets.

Now if we have an item set with only one item S ::= E ., where S is the start symbol, then this state is the accepting state (state I2 in the example). If we have an item set with only one item X ::= a . (where the dot appears at the end of the rhs), then this state corresponds to a reduction by the production X ::= a. If we have a transition by a terminal symbol (such as from I0 to I5 by id), then this corresponds to a shifting.

The ACTION and GOTO tables that correspond to this DFA are:

state action goto
  id ( ) + $ S E T
0 s5 s6         1 9
1       s3 s2      
2 a a a a a      
3 s5 s6           4
4 r2 r2 r2 r2 r2      
5 r4 r4 r4 r4 r4      
6 s5 s6         7 9
7     s8 s3        
8 r5 r5 r5 r5 r5      
9 r3 r3 r3 r3 r3      

As another example, consider the following augmented grammar:

0) S' ::= S $
1) S  ::= B B
2) B  ::= a B
3) B  ::= c

The state diagram is:

exam13.gif

The ACTION and GOTO parsing tables are:

state action goto
  a c $ S' S B
0 s5 s7     1 3
1     s2      
2 a a a      
3 s5 s7       4
4 r1 r1 r1      
5 s5 s7       6
6 r2 r2 r2      
7 r3 r3 r3      
As yet another example, the grammar:
S ::= E $
E ::= ( L )
E ::= ( )
E ::= id
L ::= L , E
L ::= E
has the following state diagram:

lr0.gif

If we have an item set with more than one items with a dot at the end of their rhs, then this is an error, called a reduce-reduce conflict, since we can not choose which rule to use to reduce. This is a very important error and it should never happen. Another conflict is a shift-reduce conflict where we have one reduction and one or more shifts at the same item set. For example, when we find b in a+b+c, do we reduce a+b or do we shift the plus token and reduce b+c later? This type of ambiguities is usually resolved by assigning the proper precedences to operators (described later). The shift-reduce conflict is not as severe as the reduce-reduce conflict: a parser generator selects reduction against shifting but we may get a wrong behavior. If the grammar has no reduce-reduce and no shift-reduce conflicts, it is LR(0) (ie. we read left-to-right, we use right-most derivations, and we don't look ahead any tokens).


next up previous contents
Next: 3.3.4 SLR(1), LR(1), and Up: 3.3 Bottom-up Parsing Previous: 3.3.2 Shift-Reduce Parsing Using   Contents
2015-01-20