Now the only thing that remains to do is, given a CFG, to construct the finite automaton (DFA) that recognizes handles. After that, constructing the ACTION and GOTO tables will be straightforward.
The states of the finite state machine correspond to item sets. An
item (or configuration) is a production with a dot (.) at
the rhs that indicates how far we have progressed using this rule to
parse the input. For example, the item E ::= E + . E
indicates
that we are using the rule E ::= E + E
and that, using this
rule, we have parsed E
, we have seen a token +
, and we
are ready to parse another E
. Now, why do we need a set (an item
set) for each state in the state machine? Because many production
rules may be applicable at this point; later when we will scan more
input tokens we will be able tell exactly which production to use.
This is the time when we are ready to reduce by the chosen production.
For example, say that we are in a state that corresponds to the item set with the following items:
S ::= id . := E S ::= id . : SThis state indicates that we have already parsed an
id
from the input
but we have 2 possibilities: if the next token is :=
we will use
the first rule and if it is :
we will use the second.
Now we are ready to construct our automaton. Since we do not want to
work with NFAs, we build a DFA directly. So it is important to
consider closures (like we did when we transformed NFAs to DFAs). The
closure of an item X ::= a . t b
(ie, the dot appears before a
terminal t
) is the singleton set that contains the item X ::= a . t b
only.
The closure of an item X ::= a . Y b
(ie, the dot appears before a
nonterminal Y
) is the set consisting of the item itself, plus
all productions for Y
with the dot at the left of the rhs, plus
the closures of these items. For example, the closure of the item
E ::= E + . T
is the set:
E ::= E + . T T ::= . T * F T ::= . T / F T ::= . F F ::= . num F ::= . id
The initial state of the DFA (state 0) is the closure of the item
S ::= . a $
, where S ::= a $
is the first rule. In simple words,
if there is an item X ::= a . s b
in an item set, where
s
is a symbol (terminal or nonterminal), we have a transition
labelled by s
to an item set that contains
X ::= a s . b
. But it's a little bit more complex than that:
s
, say X ::= a . s b
and Y ::= c . s d
, then the
new item set contains both X ::= a s . b
and
Y ::= c s . d
.
For example, our previous grammar which parses the R.E. ab*$:
0) S ::= R $ 1) R ::= R b 2) R ::= ahas the following item sets:
which forms the following DFA:
We can understand now the R transition: If we read an entire R term (that is, after we reduce by a rule that parses R), and the previous state before the reduction was 0, we jump to state 1.
As another example, the CFG:
1) S ::= E $ 2) E ::= E + T 3) | T 4) T ::= id 5) | ( E )has the following item sets:
I0: S ::= . E $ I4: E ::= E + T . E ::= . E + T E ::= . T I5: T ::= id . T ::= . id T ::= . ( E ) I6: T ::= ( . E ) E ::= . E + T I1: S ::= E . $ E ::= . T E ::= E . + T T ::= . id T ::= . ( E ) I2: S ::= E $ . I7: T ::= ( E . ) I3: E ::= E + . T E ::= E . + T T ::= . id T ::= . ( E ) I8: T ::= ( E ) . I9: E ::= T .that correspond to the DFA:
Explanation: The initial state I0
is the closure of
S ::= . E $
. The two items S ::= . E $
and
E ::= . E + T
in I0
must have a transition by
E
since the dot appears before E
. So we have a new item
set I1
and a transition from I0
to I1
by
E
. The I1
item set must contain the closure of the items
S ::= . E $
and E ::= . E + T
with the dot moved one
place right. Since the dot in I1
appears before terminals
($
and +
), the closure is the items themselves.
Similarly, we have a transition from I0
to I9
by
T
, to I5
by id
, to I6
by '(
', etc,
and we do the same thing for all item sets.
Now if we have an item set with only one item S ::= E .
, where
S
is the start symbol, then this state is the accepting state
(state I2
in the example). If we have an item set with only
one item X ::= a .
(where the dot appears at the end of the rhs),
then this state corresponds to a reduction by the production
X ::= a
. If we have a transition by a terminal symbol (such as
from I0
to I5
by id
), then this corresponds to a
shifting.
The ACTION and GOTO tables that correspond to this DFA are:
state | action | goto | ||||||
id | ( | ) | + | $ |
S | E | T | |
0 | s5 | s6 | 1 | 9 | ||||
1 | s3 | s2 | ||||||
2 | a | a | a | a | a | |||
3 | s5 | s6 | 4 | |||||
4 | r2 | r2 | r2 | r2 | r2 | |||
5 | r4 | r4 | r4 | r4 | r4 | |||
6 | s5 | s6 | 7 | 9 | ||||
7 | s8 | s3 | ||||||
8 | r5 | r5 | r5 | r5 | r5 | |||
9 | r3 | r3 | r3 | r3 | r3 |
As another example, consider the following augmented grammar:
0) S' ::= S $ 1) S ::= B B 2) B ::= a B 3) B ::= c
The state diagram is:
The ACTION and GOTO parsing tables are:
state | action | goto | ||||
a | c | $ | S' | S | B | |
0 | s5 | s7 | 1 | 3 | ||
1 | s2 | |||||
2 | a | a | a | |||
3 | s5 | s7 | 4 | |||
4 | r1 | r1 | r1 | |||
5 | s5 | s7 | 6 | |||
6 | r2 | r2 | r2 | |||
7 | r3 | r3 | r3 |
S ::= E $ E ::= ( L ) E ::= ( ) E ::= id L ::= L , E L ::= Ehas the following state diagram:
If we have an item set with more than one items with a dot at the end
of their rhs, then this is an error, called a reduce-reduce
conflict, since we can not choose which rule to use to reduce. This
is a very important error and it should never happen. Another conflict
is a shift-reduce conflict where we have one reduction and one or
more shifts at the same item set. For example, when we find b
in a+b+c
, do we reduce a+b
or do we shift the plus token
and reduce b+c
later? This type of ambiguities is usually
resolved by assigning the proper precedences to operators (described
later). The shift-reduce conflict is not as severe as the reduce-reduce
conflict: a parser generator selects reduction against shifting but we may
get a wrong behavior. If the grammar has no reduce-reduce and no
shift-reduce conflicts, it is LR(0) (ie. we read left-to-right, we use right-most
derivations, and we don't look ahead any tokens).