Recall the original CFG G1:

0) S :: = E $ 1) E ::= E + T 2) | E - T 3) | T 4) T ::= T * F 5) | T / F 6) | F 7) F ::= num 8) | idHere is a trace of the bottom-up parsing of the input

`x-2*y$`

:
Stack rest-of-the-input Action --------------------------------------------------------------- 1) id - num * id $ shift 2) id - num * id $ reduce by rule 8 3) F - num * id $ reduce by rule 6 4) T - num * id $ reduce by rule 3 5) E - num * id $ shift 6) E - num * id $ shift 7) E - num * id $ reduce by rule 7 8) E - F * id $ reduce by rule 6 9) E - T * id $ shift 10) E - T * id $ shift 11) E - T * id $ reduce by rule 8 12) E - T * F $ reduce by rule 4 13) E - T $ reduce by rule 2 14) E $ shift 15) S accept (reduce by rule 0)We start with an empty stack and we finish when the stack contains the non-terminal

`S`

only.
Every time we shift, we push the current token into the stack and read the
next token from the input sequence. When we reduce by a rule,
the top symbols of the stack (the handle) must match the rhs of the rule `T ::= T * F`

).
Thus we pop the symbols `F`

, `*`

, and `T`

(in that order)
and we push `T`

.
At each instance, the stack (from bottom to top)
concatenated with the rest of the input (the unread input) forms a valid
bottom-up derivation from the input sequence. That is, the derivation sequence here is:
1) id - num * id $ 3) => F - num * id $ 4) => T - num * id $ 5) => E - num * id $ 8) => E - F * id $ 9) => E - T * id $ 12) => E - T * F $ 13) => E - T $ 14) => E $ 15) => SThe purpose of the above example is to understand how bottom-up parsing works. It involves lots of guessing. We will see that we don't actually push symbols in the stack. Rather we push states that represent possibly more than one potentials for rule reductions.

As we learned in earlier sections, a DFA can be represented by transition tables. In this case we use two tables: ACTION and GOTO. ACTION is used for recognizing which action to perform and, if it is shifting, which state to go next. GOTO indicates which state to go next for a nonterminal symbol. We will later learn how to construct these tables.

Consider the following grammar G4:

1) S ::= E $ 2) E ::= E + T 3) | T 4) T ::= id 5) | ( E )

The DFA that recognizes the handles of this grammar is: (this looks like magic now, but will be explained later how it is constructed)

where r2 for example means reduce by rule 2 (`E ::= E + T`

).
The ACTION and GOTO tables that correspond to this DFA are:

state | action | goto | ||||||

id | ( | ) | + | `$` |
S | E | T | |

0 | s5 | s6 | 1 | 9 | ||||

1 | s3 | s2 | ||||||

2 | a | a | a | a | a | |||

3 | s5 | s6 | 4 | |||||

4 | r2 | r2 | r2 | r2 | r2 | |||

5 | r4 | r4 | r4 | r4 | r4 | |||

6 | s5 | s6 | 7 | 9 | ||||

7 | s8 | s3 | ||||||

8 | r5 | r5 | r5 | r5 | r5 | |||

9 | r3 | r3 | r3 | r3 | r3 |

where for example s5 means shift a token into the stack and go to state 5.

Here is the shift-reduce parser:

push(0); read_next_token(); for(;;) { s = top(); if (ACTION[s,current_token] == 'si') /* shift and go to state i */ { push(i); read_next_token(); } else if (ACTION[s,current_token] == 'ri') /* reduce by rule i: X ::= A1...An */ { perform pop() n times; s = top(); push(GOTO[s,X]); } else if (ACTION[s,current_token] == 'a') success!! else error(); }Note that the stack contains state numbers only, not symbols.

Now the only thing that remains to do is, given a CFG, to construct the finite automaton (DFA) that recognizes handles. After that, constructing the ACTION and GOTO tables will be straightforward.

The states of the finite state machine correspond to item sets. An
*item* (or *configuration*) is a production with a dot (.) at
the rhs that indicates how far we have progressed using this rule to
parse the input. For example, the item `E ::= E + . E`

indicates
that we are using the rule `E ::= E + E`

and that, using this
rule, we have parsed `E`

, we have seen a token `+`

, and we
are ready to parse another `E`

. Now, why do we need a set (an item
set) for each state in the state machine? Because many production
rules may be applicable at this point; later when we will scan more
input tokens we will be able tell exactly which production to use.
This is the time when we are ready to reduce by the chosen production.

For example, say that we are in a state that corresponds to the item set with the following items:

S ::= id . := E S ::= id . : SThis state indicates that we have already parsed an

`id`

from the input
but we have 2 possibilities: if the next token is `:=`

we will use
the first rule and if it is `:`

we will use the second.
Now we are ready to construct our automaton. Since we do not want to
work with NFAs, we build a DFA directly. So it is important to
consider closures (like we did when we transformed NFAs to DFAs). The
closure of an item `X ::= a . t b`

(ie, the dot appears before a
terminal `t`

) is the singleton set that contains the item `X ::= a . t b`

only.
The closure of an item `X ::= a . Y b`

(ie, the dot appears before a
nonterminal `Y`

) is the set consisting of the item itself, plus
all productions for `Y`

with the dot at the left of the rhs, plus
the closures of these items. For example, the closure of the item
`E ::= E + . T`

is the set:

E ::= E + . T T ::= . T * F T ::= . T / F T ::= . F F ::= . num F ::= . id

The initial state of the DFA (state 0) is the closure of the item
`S ::= . E $`

, where `E`

is the start symbol. In few words,
if there is an item `X ::= a . s b`

in an item set, where
`s`

is a symbol (terminal or nonterminal), we have a transition
labelled by `s`

to an item set that contains
`X ::= a s . b`

. But it's a little bit more complex than that:

- If we have more than one items with a dot before the same symbol
`s`

, say`X ::= a . s b`

and`Y ::= c . s d`

, then the new item set contains both`X ::= a s . b`

and`Y ::= c s . d`

. - We need to get the closure of the new item set.
- We have to check if this item set has been appeared before so that we don't create it again.

1) S ::= E $ 2) E ::= E + T 3) | T 4) T ::= id 5) | ( E )

has the following item sets:

I0: S ::= . E $ I4: E ::= E + T . E ::= . E + T E ::= . T I5: T ::= id . T ::= . id T ::= . ( E ) I6: T ::= ( . E ) E ::= . E + T I1: S ::= E . $ E ::= . T E ::= E . + T T ::= . id T ::= . ( E ) I2: S ::= E $ . I7: T ::= ( E . ) I3: E ::= E + . T E ::= E . + T T ::= . id T ::= . ( E ) I8: T ::= ( E ) . I9: E ::= T .

that correspond to the DFA:

Explanation: The initial state `I0`

is the closure of
`S ::= . E $`

. The two items `S ::= . E $`

and
`E ::= . E + T`

in `I0`

must have a transition by
`E`

since the dot appears before `E`

. So we have a new item
set `I1`

and a transition from `I0`

to `I1`

by
`E`

. The `I1`

item set must contain the closure of the items
`S ::= . E $`

and `E ::= . E + T`

with the dot moved one
place right. Since the dot in `I1`

appears before terminals
(`$`

and `+`

), the closure is the items themselves.
Similarly, we have a transition from `I0`

to `I9`

by
`T`

, to `I5`

by `id`

, to `I6`

by '`(`

', etc,
and we do the same thing for all item sets.

Now if we have an item set with only one item `S ::= E .`

, where
`S`

is the start symbol, then this state is the *accepting state*
(state `I2`

in the example). If we have an item set with only
one item `X ::= a .`

(where the dot appears at the end of the rhs),
then this state corresponds to a reduction by the production
`X ::= a`

. If we have a transition by a terminal symbol (such as
from `I0`

to `I5`

by `id`

), then this corresponds to a
shifting.

As another example, consider the following augmented grammar:

0) S' ::= S $ 1) S ::= B B 2) B ::= a B 3) B ::= c

The state diagram is:

The ACTION and GOTO parsing tables are:

state | action | goto | ||||

a | c | $ | S' | S | B | |

0 | s5 | s7 | 1 | 3 | ||

1 | s2 | |||||

2 | a | a | a | |||

3 | s5 | s7 | 4 | |||

4 | r1 | r1 | r1 | |||

5 | s5 | s7 | 6 | |||

6 | r2 | r2 | r2 | |||

7 | r3 | r3 | r3 |

As yet another example, the grammar:

S ::= E $ E ::= ( L ) E ::= ( ) E ::= id L ::= L , E L ::= Ehas the following state diagram:

If we have an item set with more than one items with a dot at the end
of their rhs, then this is an error, called a *reduce-reduce
conflict*, since we can not choose which rule to use to reduce. This
is a very important error and it should never happen. Another conflict
is a *shift-reduce conflict* where we have one reduction and one or
more shifts at the same item set. For example, when we find `b`

in `a+b+c`

, do we reduce `a+b`

or do we shift the plus token
and reduce `b+c`

later? This type of ambiguities is usually
resolved by assigning the proper precedences to operators (described
later). The shift-reduce conflict is not as severe as the reduce-reduce
conflict: a parser generator selects reduction against shifting but we may
get a wrong behavior. If the grammar has no reduce-reduce and no
shift-reduce conflicts, it is LR(0) (ie. we read left-to-right, we use right-most
derivations, and we don't look ahead any tokens).

2002-08-26