Predictive parsing can also be accomplished using a predictive
parsing table and a stack. It is sometimes called non-recursive
predictive parsing. The idea is that we construct a table
M[X, token] which indicates which production to use if the top of the
stack is a nonterminal X and the current token is equal to token;
in that case we pop X from the stack and we push all the rhs symbols
of the production
M[X, token] in reverse order.
We use a special symbol $
to denote the end of file.
Let S be the start symbol.
Here is the parsing algorithm:
push(S); read_next_token(); repeat X = pop(); if (X is a terminal or '$') if (X == current_token) read_next_token(); else error(); else if (M[X,current_token] == "X ::= Y1 Y2 ... Yk") { push(Yk); ... push(Y1); } else error(); until X == '$';
Now the question is how to construct the parsing table. We need first to derive the FIRST[a] for some symbol sequence a and the FOLLOW[X] for some nonterminal X. In few words, FIRST[a] is the set of terminals t that result after a number of derivations on a (ie, a = > ... = > tb for some b). For example, FIRST[3 + E] = {3} since 3 is the first terminal. To find FIRST[E + T], we need to apply one or more derivations to E until we get a terminal at the beginning. If E can be reduced to the empty sequence, then the FIRST[E + T] must also contain the FIRST[+ T] = { + }. The FOLLOW[X] is the set of all terminals that follow X in any legal derivation. To find the FOLLOW[X], we need find all productions Z : : = aXb in which X appears at the rhs. Then the FIRST[b] must be part of the FOLLOW[X]. If b is empty, then the FOLLOW[Z] must be part of the FOLLOW[X].
Consider our CFG G1:
1) E ::= T E' $ 2) E' ::= + T E' 3) | - T E' 4) | 5) T ::= F T' 6) T' ::= * F T' 7) | / F T' 8) | 9) F ::= num 10) | id
FIRST[F]
is of course {num,id}
. This means that
FIRST[T]=FIRST[F]={num,id}
. In addition,
FIRST[E]=FIRST[T]={num,id}
. Similarly, FIRST[T']
is
{*,/}
and FIRST[E']
is {+,-}
.
The FOLLOW of E
is {$}
since there is no production that
has E
at the rhs. For E'
, rules 1, 2, and 3 have
E'
at the rhs. This means that the FOLLOW[E']
must
contain both the FOLLOW[E]
and the FOLLOW[E']
. The first one is
{$}
while the latter is ignored since we are trying to find
E'
. Similarly, to find FOLLOW[T]
, we find the rules that
have T
at the rhs: rules 1, 2, and 3. Then FOLLOW[T]
must include FIRST[E']
and, since E' can be reduced to the empty sequence,
it must include FOLLOW[E']
too (ie, {$}
). That is,
FOLLOW[T]={+,-,$}
. Similarly,
FOLLOW[T']=FOLLOW[T]={+,-,$}
from rule 5. The FOLLOW[F]
is equal to FIRST[T']={*,/}
plus FOLLOW[T']
and plus
FOLLOW[T]
from rules 5, 6, and 7, since T'
can be
reduced to the empty sequence.
To summarize, we have:
FIRST | FOLLOW | |
E | {num,id} | {$ } |
E' | {+,-} | {$ } |
T | {num,id} | {+,-,$ } |
T' | {*,/} | {+,-,$ } |
F | {num,id} | {+,-,*,/,$ } |
For example, the parsing table of the grammar G1 is:
num | id | + | - | * | / | $ |
|
E | 1 | 1 | |||||
E' | 2 | 3 | 4 | ||||
T | 5 | 5 | |||||
T' | 8 | 8 | 6 | 7 | 8 | ||
F | 9 | 10 |
FOLLOW[T']={+,-,$}
,
we add 8 to M[T',+], M[T',-], M[T',$]
.
A grammar is called LL(1) if each element of the parsing table of the grammar has at most one production element. (The first L in LL(1) means that we read the input from left to right, the second L means that it uses left-most derivations only, and the number 1 means that we need to look one token only ahead from the input.) Thus G1 is LL(1). If we have multiple entries in M, the grammar is not LL(1).
We will parse now the string x-2*y$
using the above parse table:
Stack current_token Rule --------------------------------------------------------------- E x M[E,id] = 1 (using E ::= T E' $) $ E' T x M[T,id] = 5 (using T ::= F T') $ E' T' F x M[F,id] = 10 (using F ::= id) $ E' T' id x read_next_token $ E' T' - M[T',-] = 8 (using T' ::= ) $ E' - M[E',-] = 3 (using E' ::= - T E') $ E' T - - read_next_token $ E' T 2 M[T,num] = 5 (using T ::= F T') $ E' T' F 2 M[F,num] = 9 (using F ::= num) $ E' T' num 2 read_next_token $ E' T' * M[T',*] = 6 (using T' ::= * F T') $ E' T' F * * read_next_token $ E' T' F y M[F,id] = 10 (using F ::= id) $ E' T' id y read_next_token $ E' T' $ M[T',$] = 8 (using T' ::= ) $ E' $ M[E',$] = 4 (using E' ::= ) $ $ stop (accept)
As another example, consider the following grammar:
G ::= S $ S ::= ( L ) | a L ::= L , S | S
After left recursion elimination, it becomes
0) G := S $ 1) S ::= ( L ) 2) S ::= a 3) L ::= S L' 4) L' ::= , S L' 5) L' ::=
The first/follow tables are:
FIRST | FOLLOW | |
G | ( a | |
S | ( a | , ) $ |
L | ( a | ) |
L' | , | ) |
( | ) | a | , | $ | |
G | 0 | 0 | |||
S | 1 | 2 | |||
L | 3 | 3 | |||
L' | 5 | 4 |