12.4 Syntax of grammar definitions

12.4.1. Header and trailer
12.4.2. Declarations
12.4.3. Rules
12.4.4. Error handling

Grammar definitions have the following format:

%{
  header
%}
  declarations
%%
  rules
%%
  trailer

Comments are enclosed between `/*' and `*/' (as in C) in the "declarations" and "rules" sections, and between `(*' and `*)' (as in Caml) in the "header" and "trailer" sections.

12.4.1 Header and trailer

The header and the trailer sections are Caml code that is copied as is into file grammar.ml. Both sections are optional. The header goes at the beginning of the output file; it usually contains open directives and auxiliary functions required by the semantic actions of the rules. The trailer goes at the end of the output file.

12.4.2 Declarations

Declarations are given one per line. They all start with a `%' sign.

%token constr ...  constr

Declare the given symbols constr ... constr as tokens (terminal symbols). These symbols are added as constant constructors for the token concrete type.

%token < typexpr >  constr ...  constr

Declare the given symbols constr ... constr as tokens with an attached attribute of the given type. These symbols are added as constructors with arguments of the given type for the token concrete type. The typexpr part is an arbitrary Caml type expression, except that all type constructor names must be fully qualified (e.g. Modname.typename) for all types except standard built-in types, even if the proper `open' directives (e.g. `open Modname') were given in the header section. That's because the header is copied only to the .ml output file, but not to the .mli output file, while the typexpr part of a `%token' declaration is copied to both.

%start symbol ...  symbol

Declare the given symbols as entry points for the grammar. For each entry point, a parsing function with the same name is defined in the output module. Non-terminals that are not declared as entry points have no such parsing function. Start symbols must be given a type with the `%type' directive below.

%type < typexpr >  symbol ...  symbol

Specify the type of the semantic attributes for the given symbols. This is mandatory for start symbols only.

Other nonterminal symbols need not be given types by hand: these types will be inferred when running the output files through the Objective Caml compiler (unless the `-s' option is in effect). The typexpr part is an arbitrary Caml type expression, except that all type constructor names must be fully qualified, as explained above for %token.

%left symbol ...  symbol  
%right symbol ...  symbol  
%nonassoc symbol ...  symbol 

Associate precedences and associativities to the given symbols. All symbols on the same line are given the same precedence. They have higher precedence than symbols declared before in a `%left', `%right' or `%nonassoc' line.

They have lower precedence than symbols declared after in a `%left', `%right' or `%nonassoc' line. The symbols are declared to associate to the left (`%left'), to the right (`%right'), or to be non-associative (`%nonassoc'). The symbols are usually tokens. They can also be dummy nonterminals, for use with the `%prec' directive inside the rules.

The precedence declarations are used in the following way to resolve reduce/reduce and shift/reduce conflicts:

  • Tokens and rules have precedences. By default, the precedence of a rule is the precedence of its rightmost terminal. You can override this default by using the %prec directive in the rule.
  • A reduce/reduce conflict is resolved in favor of the first rule (in the order given by the source file), and ocamlyacc outputs a warning.
  • A shift/reduce conflict is resolved by comparing the precedence of the rule to be reduced with the precedence of the token to be shifted. If the precedence of the rule is higher, then the rule will be reduced; if the precedence of the token is higher, then the token will be shifted.
  • A shift/reduce conflict between a rule and a token with the same precedence will be resolved using the associativity: if the token is left-associative, then the parser will reduce; if the token is right-associative, then the parser will shift. If the token is non-associative, then the parser will declare a syntax error.
  • When a shift/reduce conflict cannot be resolved using the above method, then ocamlyacc will output a warning and the parser will always shift.

12.4.3 Rules

The syntax for rules is as usual:

nonterminal :
    symbol ... symbol { semantic-action }
  | ...
  | symbol ... symbol { semantic-action }
;

Rules can also contain the `%prec 'symbol directive in the right-hand side part, to override the default precedence and associativity of the rule with the precedence and associativity of the given symbol.

Semantic actions are arbitrary Caml expressions, that are evaluated to produce the semantic attribute attached to the defined nonterminal. The semantic actions can access the semantic attributes of the symbols in the right-hand side of the rule with the `$' notation: `$1' is the attribute for the first (leftmost) symbol, `$2' is the attribute for the second symbol, etc.

The rules may contain the special symbol error to indicate resynchronization points, as in yacc.

Actions occurring in the middle of rules are not supported.

Nonterminal symbols are like regular Caml symbols, except that they cannot end with ' (single quote).

12.4.4 Error handling

Error recovery is supported as follows: when the parser reaches an error state (no grammar rules can apply), it calls a function named parse_error with the string "syntax error" as argument. The default parse_error function does nothing and returns, thus initiating error recovery (see below). The user can define a customized parse_error function in the header section of the grammar file.

The parser also enters error recovery mode if one of the grammar actions raises the Parsing.Parse_error exception.

In error recovery mode, the parser discards states from the stack until it reaches a place where the error token can be shifted. It then discards tokens from the input until it finds three successive tokens that can be accepted, and starts processing with the first of these. If no state can be uncovered where the error token can be shifted, then the parser aborts by raising the Parsing.Parse_error exception.

Refer to documentation on yacc for more details and guidance in how to use error recovery.