3.2.2 Predictive Parsing Using Tables

Next: 3.3 Bottom-up Parsing Up: 3.2 Predictive Parsing Previous: 3.2.1 Recursive Descent Parsing Contents

3.2.2 Predictive Parsing Using Tables

Predictive parsing can also be accomplished using a predictive parsing table and a stack. It is sometimes called non-recursive predictive parsing. The idea is that we construct a table M[X, token] which indicates which production to use if the top of the stack is a nonterminal X and the current token is equal to token; in that case we pop X from the stack and we push all the rhs symbols of the production M[X, token] in reverse order. We use a special symbol $ to denote the end of file. Let S be the start symbol. Here is the parsing algorithm:

push(S);
read_next_token();
repeat
  X = pop();
  if (X is a terminal or '$')
     if (X == current_token)
        read_next_token();
     else error();
  else if (M[X,current_token] == "X ::= Y1 Y2 ... Yk")
     {  push(Yk);
        ...
        push(Y1);
     }
  else error();
until X == '$';

Now the question is how to construct the parsing table. We need first to derive the FIRST[a] for some symbol sequence a and the FOLLOW[X] for some nonterminal X. In few words, FIRST[a] is the set of terminals t that result after a number of derivations on a (ie, a = > ... = > tb for some b). For example, FIRST[3 + E] = {3} since 3 is the first terminal. To find FIRST[E + T], we need to apply one or more derivations to E until we get a terminal at the beginning. If E can be reduced to the empty sequence, then the FIRST[E + T] must also contain the FIRST[+ T] = { + }. The FOLLOW[X] is the set of all terminals that follow X in any legal derivation. To find the FOLLOW[X], we need find all productions Z : : = aXb in which X appears at the rhs. Then the FIRST[b] must be part of the FOLLOW[X]. If b is empty, then the FOLLOW[Z] must be part of the FOLLOW[X].

Consider our CFG G1:

1)   E ::= T E' $
2)   E' ::= + T E'
3)        | - T E'
4)        |
5)   T ::= F T'
6)   T' ::= * F T'
7)        | / F T'
8)        |
9)   F ::= num
10)      | id

FIRST[F] is of course {num,id}. This means that FIRST[T]=FIRST[F]={num,id}. In addition, FIRST[E]=FIRST[T]={num,id}. Similarly, FIRST[T'] is {*,/} and FIRST[E'] is {+,-}.

The FOLLOW of E is {$} since there is no production that has E at the rhs. For E', rules 1, 2, and 3 have E' at the rhs. This means that the FOLLOW[E'] must contain both the FOLLOW[E] and the FOLLOW[E']. The first one is {$} while the latter is ignored since we are trying to find E'. Similarly, to find FOLLOW[T], we find the rules that have T at the rhs: rules 1, 2, and 3. Then FOLLOW[T] must include FIRST[E'] and, since E' can be reduced to the empty sequence, it must include FOLLOW[E'] too (ie, {$}). That is, FOLLOW[T]={+,-,$}. Similarly, FOLLOW[T']=FOLLOW[T]={+,-,$} from rule 5. The FOLLOW[F] is equal to FIRST[T']={*,/} plus FOLLOW[T'] and plus FOLLOW[T] from rules 5, 6, and 7, since T' can be reduced to the empty sequence.

To summarize, we have:

	FIRST	FOLLOW
E	{num,id}	{`$`}
E'	{+,-}	{`$`}
T	{num,id}	{+,-,`$`}
T'	{*,/}	{+,-,`$`}
F	{num,id}	{+,-,*,/,`$`}

Now, given the above table, we can easily construct the parsing table. For each t $\in$ FIRST[a], add X : : = a to M[X, t]. If a can be reduced to the empty sequence, then for each t $\in$ FOLLOW[X], add X : : = a to M[X, t].

For example, the parsing table of the grammar G1 is:

	num	id	+	-	*	/	`$`
E	1	1
E'			2	3			4
T	5	5
T'			8	8	6	7	8
F	9	10

where the numbers are production numbers. For example, consider the eighth production T' : : = . Since FOLLOW[T']={+,-,$}, we add 8 to M[T',+], M[T',-], M[T',$].

A grammar is called LL(1) if each element of the parsing table of the grammar has at most one production element. (The first L in LL(1) means that we read the input from left to right, the second L means that it uses left-most derivations only, and the number 1 means that we need to look one token only ahead from the input.) Thus G1 is LL(1). If we have multiple entries in M, the grammar is not LL(1).

We will parse now the string x-2*y$ using the above parse table:

Stack         current_token     Rule
---------------------------------------------------------------
E                 x           M[E,id] = 1   (using E ::= T E' $)
$ E' T            x           M[T,id] = 5   (using T ::= F T')
$ E' T' F         x           M[F,id] = 10  (using F ::= id)
$ E' T' id        x           read_next_token
$ E' T'           -           M[T',-] = 8   (using T' ::= )
$ E'              -           M[E',-] = 3   (using E' ::= - T E')
$ E' T -          -           read_next_token
$ E' T            2           M[T,num] = 5  (using T ::= F T')
$ E' T' F         2           M[F,num] = 9  (using F ::= num)
$ E' T' num       2           read_next_token
$ E' T'           *           M[T',*] = 6   (using T' ::= * F T')
$ E' T' F *       *           read_next_token
$ E' T' F         y           M[F,id] = 10  (using F ::= id)
$ E' T' id        y           read_next_token
$ E' T'           $           M[T',$] = 8   (using T' ::= )
$ E'              $           M[E',$] = 4   (using E' ::= )
$                 $           stop (accept)

As another example, consider the following grammar:

G ::= S $
S ::= ( L )
    | a
L ::= L , S
    | S

After left recursion elimination, it becomes

0)  G := S $
1)  S ::= ( L )
2)  S ::= a
3)  L ::= S L'
4)  L' ::= , S L'
5)  L' ::=

The first/follow tables are:

	FIRST	FOLLOW
G	( a
S	( a	, ) $
L	( a	)
L'	,	)

which are used to create the parsing table:

	(	)	a	,	$
G	0		0
S	1		2
L	3		3
L'		5		4

Next: 3.3 Bottom-up Parsing Up: 3.2 Predictive Parsing Previous: 3.2.1 Recursive Descent Parsing Contents

2015-01-20