next up previous contents
Next: 3.2.1 Recursive Descent Parsing Up: 3. Parsing Previous: 3.1 Context-free Grammars   Contents

3.2 Predictive Parsing

The goal of predictive parsing is to construct a top-down parser that never backtracks. To do so, we must transform a grammar in two ways:

1.
eliminate left recursion, and
2.
perform left factoring.
These rules eliminate most common causes for backtracking although they do not guarantee a completely backtrack-free parsing (called LL(1) as we will see later).

Consider this grammar:

A ::= Aa
    | b
It recognizes the regular expression ba*. The problem is that if we use the first production for top-down derivation, we will fall into an infinite derivation chain. This is called left recursion. But how else can you express ba*? Here is an alternative way:
A ::= bA'
A' ::= aA'
     |
where the third production is an empty production (ie, it is A' ::= ). That is, A' parses the RE a*. Even though this CFG is recursive, it is not left recursive. In general, for each nonterminal X, we partition the productions for X into two groups: one that contains the left recursive productions, and the other with the rest. Suppose that the first group is:
X ::= X a1
...
X ::= X an
while the second group is:
X ::= b1
...
X ::= bm
where a, b are symbol sequences. Then we eliminate the left recursion by rewriting these rules into:
X ::= b1 X'
...
X ::= bm X'
X' ::= a1 X'
...
X' ::= an X'
X' ::=
For example, the CFG G1 is transformed into:
E ::= T E'
E' ::= + T E'
     | - T E'
     |
T ::= F T'
T' ::= * F T'
     | / F T'
     |
F ::= num
    | id

Suppose now that we have a number of productions for X that have a common prefix in their rhs (but without any left recursion):

X ::= a b1
...
X ::= a bn
We factor out the common prefix as follows:
X ::= a X'
X' ::= b1
...
X' ::= bn
This is called left factoring and it helps predict which rule to use without backtracking. For example, the rule from our right associative grammar G2:
E ::= T + E
    | T - E
    | T
is translated into:
E ::= T E'
E' ::= + E
     | - E
     |

As another example, let L be the language of all regular expressions over the alphabet $\Sigma=\{a,b\}$. That is, $L=\{\lq\lq \varepsilon'',\lq\lq a'',\lq\lq b'',\lq\lq a*'',\lq\lq b*'',\lq\lq a\vert b'',\lq\lq (a\vert b)'',...\}$. For example, the string ``a(a|b)*|b*" is a member of L. There is no RE that captures the syntax of all REs. Consider for example the RE $((\cdots(a)\cdots))$, which is equivalent to the language (na)n for all n. This represents a valid RE but there is no RE that can capture its syntax. A context-free grammar that recognizes L is:

\begin{displaymath}\begin{array}{lrl}
R & ::= & R\, R\\
& \vert & R\,\lq\lq \vert''\...
...ert & a\\
& \vert & b\\
& \vert & \lq\lq \varepsilon''
\end{array}\end{displaymath}

after elimination of left recursion, this grammar becomes:

\begin{displaymath}\begin{array}{lrl}
R & ::= & (\, R\,)\,R'\\
& \vert & a\,R'\...
...& \lq\lq \vert''\,R\, R'\\
& \vert & *\,R'\\
& \vert &
\end{array}\end{displaymath}




next up previous contents
Next: 3.2.1 Recursive Descent Parsing Up: 3. Parsing Previous: 3.1 Context-free Grammars   Contents
Leonidas Fegaras
2000-12-27