Parsing with yacc

Precedence

There is one common situation where the rules given above for resolving conflicts are not sufficient. This is in the parsing of arithmetic expressions. Most of the commonly used constructions for arithmetic expressions can be naturally described by the notion of precedence levels for operators, together with information about left or right associativity. It turns out that ambiguous grammars with appropriate disambiguating rules can be used to create parsers that are faster and easier to write than parsers constructed from unambiguous grammars. The basic notion is to write grammar rules of the form

   expr  :  expr  OP  expr

and

   expr  :  UNARY  expr

for all binary and unary operators desired. This creates a very ambiguous grammar with many parsing conflicts. You specify as disambiguating rules the precedence or binding strength of all the operators and the associativity of the binary operators. This information is sufficient to allow yacc to resolve the parsing conflicts in accordance with these rules and construct a parser that realizes the desired precedences and associativities.

The precedences and associativities are attached to tokens in the declarations section. This is done by a series of lines beginning with the yacc keywords %left, %right, or %nonassoc, followed by a list of tokens. All of the tokens on the same line are assumed to have the same precedence level and associativity; the lines are listed in order of increasing precedence or binding strength. Thus

   %left  '+'  '-'
   %left  ''  '/'

describes the precedence and associativity of the four arithmetic operators. + and - are left associative and have lower precedence than and /, which are also left associative. The keyword %right is used to describe right associative operators. The keyword %nonassoc is used to describe operators, like the operator .LT. in FORTRAN, that may not associate with themselves. That is, because

   A .LT. B .LT. C

is invalid in FORTRAN, .LT. would be described with the keyword %nonassoc in yacc.

As an example of the behavior of these declarations, the description

   %right  '='
   %left  '+'  '-'
   %left  ''  '/'

   %%

   expr   :   expr  '='  expr
          |   expr  '+'  expr
          |   expr  '-'  expr
          |   expr  ''  expr
          |   expr  '/'  expr
          |   NAME
          ;

might be used to structure the input

   a  =  b  =  c  d  -  e  -  f  g

as follows

   a = ( b = ( ((c  d) - e) - (f  g) ) )

in order to achieve the correct precedence of operators. When this mechanism is used, unary operators must, in general, be given a precedence. Sometimes a unary operator and a binary operator have the same symbolic representation but different precedences. An example is unary and binary minus.

Unary minus may be given the same strength as multiplication, or even higher, while binary minus has a lower strength than multiplication. The keyword %prec changes the precedence level associated with a particular grammar rule. %prec appears immediately after the body of the grammar rule, before the action or closing semicolon, and is followed by a token name or literal. It causes the precedence of the grammar rule to become that of the following token name or literal. For example, the rules

   %left  '+'  '-'
   %left  ''  '/'

   %%

   expr   :   expr  '+'  expr
          |   expr  '-'  expr
          |   expr  ''  expr
          |   expr  '/'  expr
          |   '-'  expr      %prec  ''
          |   NAME
          ;

might be used to give unary minus the same precedence as multiplication.

A token declared by %left, %right, and %nonassoc need not, but may, be declared by %token as well.

Precedences and associativities are used by yacc to resolve parsing conflicts. They give rise to the following disambiguating rules:

Precedences and associativities are recorded for those tokens and literals that have them.
A precedence and associativity is associated with each grammar rule. It is the precedence and associativity of the last token or literal in the body of the rule. If the %prec construction is used, it overrides this default. Some grammar rules may have no precedence and associativity associated with them.
When there is a reduce-reduce or shift-reduce conflict, and either the input symbol or the grammar rule has no precedence and associativity, then the two default disambiguating rules given in the preceding section are used, and the conflicts are reported.
If there is a shift-reduce conflict and both the grammar rule and the input character have precedence and associativity associated with them, then the conflict is resolved in favor of the action -- shift or reduce -- associated with the higher precedence. If precedences are equal, then associativity is used. Left associative implies reduce; right associative implies shift; nonassociating implies error.

Conflicts resolved by precedence are not counted in the number of shift-reduce and reduce-reduce conflicts reported by yacc. This means that mistakes in the specification of precedences may disguise errors in the input grammar. It is a good idea to be sparing with precedences and use them in a cookbook fashion until some experience has been gained. The y.output file is useful in deciding whether the parser is actually doing what was intended.

To illustrate further how you might use the precedence keywords to resolve a shift-reduce conflict, we will look at an example similar to the one described in the previous section. Consider the following C statement:

   if (flag) if (anotherflag) x = 1;
   else x = 2;

The problem for the parser is whether the else goes with the first or the second if. C programmers will recognize that the else goes with the second if, contrary to to what the misleading indentation suggests. The following yacc grammar for an if-then-else construct abstracts the problem. That is, the input iises will model the C statement shown above.

   %{
   #include <stdio.h>
   %}
   %token SIMPLE IF ELSE
   %%
   S		: stmnt '\n'
   		;
   stmnt		: SIMPLE
   		| if_stmnt
   		;
   if_stmnt		: IF stmnt
   			{ printf("simple if\n");}
   		| IF stmnt ELSE stmnt
   			{ printf("if_then_else\n");}
   		;
   %%
   int
   yylex() {
   	int c;
   	c=getchar();
   	if (c==EOF) return 0;
   	else switch(c) {
   		case 'i': return IF;
   		case 's': return SIMPLE;
   		case 'e': return ELSE;
   		default: return c;
   		}
   }

When the specification is passed to yacc, however, we get the following message:

   conflicts: 1 shift/reduce

The problem is that when yacc has read iis in trying to match iises, it has two choices: recognize is as a statement (reduce), or read some more input (shift) and eventually recognize ises as a statement.

One way to resolve the problem is to invent a new token REDUCE whose sole purpose is to give the correct precedence for the rules:

   %{
   #include <stdio.h>
   %}
   %token SIMPLE IF
   %nonassoc REDUCE
   %nonassoc ELSE
   %%
   S		: stmnt '\n'
   		;
   stmnt		: SIMPLE
   		| if_stmnt
   		;
   if_stmnt		: IF stmnt %prec REDUCE
   			{ printf("simple if");}
   		| IF stmnt ELSE stmnt
   			{ printf("if_then_else");}
   		;
   %%
   ...

Since the precedence associated with the second form of if_stmnt is higher now, yacc will try to match that rule first, and no conflict will be reported.

Actually, in this simple case, the new token is not needed:

   %nonassoc IF
   %nonassoc ELSE

would also work. Moreover, it is not really necessary to resolve the conflict in this way, because, as we have seen, yacc will shift by default in a shift-reduce conflict. Resolving conflicts is a good idea, though, in the sense that you should not see diagnostic messages for correct specifications.