The goal of predictive parsing is to construct a top-down parser that never backtracks. To do so, we must transform a grammar in two ways:
Consider this grammar:
A ::= Aa
| b
It recognizes the regular expression ba*. The problem is that if we use the first
production for top-down derivation, we will fall into an infinite
derivation chain. This is called left recursion. But how else
can you express ba*? Here is an alternative way:
A ::= bA'
A' ::= aA'
|
where the third production is an empty production (ie, it is
A' ::= ). That is, A' parses the RE a*. Even though this
CFG is recursive, it is not left recursive. In general, for each
nonterminal X, we partition the productions for X into two groups:
one that contains the left recursive productions, and the other with
the rest. Suppose that the first group is:
X ::= X a1 ... X ::= X anwhile the second group is:
X ::= b1 ... X ::= bmwhere
a, b are symbol sequences.
Then we eliminate the left recursion by rewriting these rules into:
X ::= b1 X' ... X ::= bm X' X' ::= a1 X' ... X' ::= an X' X' ::=For example, the CFG G1 is transformed into:
E ::= T E'
E' ::= + T E'
| - T E'
|
T ::= F T'
T' ::= * F T'
| / F T'
|
F ::= num
| id
Suppose now that we have a number of productions for X that have a common prefix in their rhs (but without any left recursion):
X ::= a b1 ... X ::= a bnWe factor out the common prefix as follows:
X ::= a X' X' ::= b1 ... X' ::= bnThis is called left factoring and it helps predict which rule to use without backtracking. For example, the rule from our right associative grammar G2:
E ::= T + E
| T - E
| T
is translated into:
E ::= T E'
E' ::= + E
| - E
|
As another example,
let L be the language of all regular expressions over the alphabet
.
That is,
.
For example, the string ``a(a|b)*|b*" is a member of L.
There is no RE that captures the syntax of all REs.
Consider for example the RE
,
which is equivalent to the language
(na)n for all n. This represents a valid RE but there is no RE that can capture its
syntax.
A context-free grammar that recognizes L is: