The goal of predictive parsing is to construct a top-down parser that never backtracks. To do so, we must transform a grammar in two ways:
Consider this grammar:
A ::= A a | bIt recognizes the regular expression ba*. The problem is that if we use the first production for top-down derivation, we will fall into an infinite derivation chain. This is called left recursion. But how else can you express ba*? Here is an alternative way:
A ::= b A' A' ::= a A' |where the third production is an empty production (ie, it is
A' ::=
). That is, A' parses the RE a*. Even though this
CFG is recursive, it is not left recursive. In general, for each
nonterminal X, we partition the productions for X into two groups:
one that contains the left recursive productions, and the other with
the rest. Suppose that the first group is:
X ::= X a1 ... X ::= X anwhile the second group is:
X ::= b1 ... X ::= bmwhere
a, b
are symbol sequences.
Then we eliminate the left recursion by rewriting these rules into:
X ::= b1 X' ... X ::= bm X' X' ::= a1 X' ... X' ::= an X' X' ::=For example, the CFG G1 is transformed into:
E ::= T E' E' ::= + T E' | - T E' | T ::= F T' T' ::= * F T' | / F T' | F ::= num | id
Suppose now that we have a number of productions for X that have a common prefix in their rhs (but without any left recursion):
X ::= a b1 ... X ::= a bnWe factor out the common prefix as follows:
X ::= a X' X' ::= b1 ... X' ::= bnThis is called left factoring and it helps predict which rule to use without backtracking. For example, the rule from our right associative grammar G2:
E ::= T + E | T - E | Tis translated into:
E ::= T E' E' ::= + E | - E |
As another example, let L be the language of all regular expressions over the alphabet = {a, b}. That is, L = {``",``a",``b",``a*",``b*",``a| b",``(a| b)",...}. For example, the string ``a(a| b)*| b*" is a member of L. There is no RE that captures the syntax of all REs. Consider for example the RE (( ... (a) ... )), which is equivalent to the language (na)n for all n. This represents a valid RE but there is no RE that can capture its syntax. A context-free grammar that recognizes L is:
R | : : = | R R |
| | R ``|" R | |
| | R * | |
| | ( R ) | |
| | a | |
| | b | |
| | ``" |
R | : : = | ( R ) R' |
| | a R' | |
| | b R' | |
| | ``" R' | |
R' | : : = | R R' |
| | ``|" R R' | |
| | * R' | |
| |