Next: 3.4 Case Study: The Up: 3.3 Bottom-up Parsing Previous: 3.3.2 SLR(1), LR(1), and

3.3.3 Practical Considerations for LALR(1) Grammars

Most parsers nowdays are specified using LALR(1) grammars fed to a parser generator, such as Bison. There are few tricks that we need to know to avoid reduce-reduce and shift-reduce conflicts.

When we learned about top-down predictive parsing, we were told that left recursion is bad. For LALR(1) the opposite is true: left recursion is good, right recursion is bad. Consider this:

```L ::= id , L
| id
```

which captures a list of ids separated by commas. If the FOLLOW of `L` (or to be more precise, the expected lookaheads for `L ::= id`) contains the comma token, we are in big trouble. We can use instead:

```L ::= L , id
| id
```

Left recursion uses less stack than right recursion and it also produces left associative trees (which is what we usually want).

Some shift-reduce conflicts are very difficult to eliminate. Consider the infamous if-then-else conflict:

```S ::= if E then S else S
| if E then S
| ...
```

Read page 68 in the textbook to see how this can be eliminated.

Most shift-reduce errors though are easy to remove by assigning precedence and associativity to operators. Consider the grammar:

```S ::= E \$
E ::= E + E
| E * E
| ( E )
| id
| num
```

and four of its LALR(1) states:

```I0:  S ::= . E \$    ?
E ::= . E + E  +*\$    I1:  S ::= E . \$    ?      I2:  E ::= E * . E   +*\$
E ::= . E * E  +*\$         E ::= E . + E  +*\$         E ::= . E + E   +*\$
E ::= . ( E )  +*\$         E ::= E . * E  +*\$         E ::= . E * E   +*\$
E ::= . id     +*\$                                    E ::= . ( E )   +*\$
E ::= . num    +*\$    I3:  E ::= E * E .   +*\$        E ::= . id      +*\$
E ::= E . + E   +*\$        E ::= . num     +*\$
E ::= E . * E   +*\$
```

Here we have a shift-reduce error. Consider the first two items in `I3`. If we have `a*b+c` and we parsed `a*b`, do we reduce using `E ::= E * E` or do we shift more symbols? In the former case we get a parse tree `(a*b)+c`; in the latter case we get `a*(b+c)`. To resolve this conflict, we can specify that `*` has higher precedence than `+`. The idea is that if the lookahead has higher precedence than the operator in the production currently used, we shift. For example, if we are parsing `E + E` using the production rule `E ::= E + E` and the lookahead is `*`, we shift `*`. If the lookahead has the same precedence and is left associative, we reduce. The above grammar is LALR(1) if we define the precedence and associativity of all the operators. Thus, it is very important when you write a parser using Bison or any other LALR(1) parser generator to specify associativities and precedences for most tokens (especially for those used as operators).

Another thing to do when specifying an LALR(1) grammar for a parser generator is error recovery. All the entries in the ACTION and GOTO tables that have no content correspond to syntax errors. The simplest thing to do in case of error is to report it and stop the parsing. But we would like to continue parsing finding more errors. This is called error recovery. Consider the grammar:

```S ::= L = E ;
| { SL } ;
| error ;
SL ::= S ;
| SL S ;
```

The special token `error` indicates to the parser what to do in case of invalid syntax for `S` (an invalid statement). In this case, it reads all the tokens from the input stream until it finds the first semicolon. The way the parser handles this is to first push an `error` state in the stack. In case of an error, the parser pops out elements from the stack until it finds an error state where it can proceed. Then it discards tokens from the input until a restart is possible. Inserting error handling productions in the proper places in a grammar to do good error recovery is considered very hard.

Next: 3.4 Case Study: The Up: 3.3 Bottom-up Parsing Previous: 3.3.2 SLR(1), LR(1), and
Leonidas Fegaras
2002-08-26