diff6-10.chapter12.txtこのページは最後に更新されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。 last mod. 2008-08-28 (木) 09:53:44
13,15c13,15 < and yacc. Readers unfamiliar with lex and yacc are referred to ``Compilers: < principles, techniques, and tools'' by Aho, Sethi and Ullman (Addison-Wesley, < 1986), or ``Lex & Yacc'', by Levine, Mason and Brown (O'Reilly, 1992). --- > and yacc. Readers unfamiliar with lex and yacc are referred to "Compilers: > principles, techniques, and tools" by Aho, Sethi and Ullman (Addison-Wesley, > 1986), or "Lex & Yacc", by Levine, Mason and Brown (O'Reilly, 1992). 42a43,61 > 12.1.1 Options > =============== > The following command-line options are recognized by ocamllex. > > > -o output-file Specify the name of the output file produced by ocamllex. > Default is lexer.ml, ocamllex being invoked as ocamllex lexer.mll. > > -ml Output code that does not use the Caml built-in automata interpreter. > Instead, the automaton is encoded by Caml functions. This option is useful > for debugging ocamllex, using it for production lexers is not recommended. > > -q Quiet mode. ocamllex normally outputs informational messages to standard > output. They are suppressed if option -q is used. > > -version Print version and exit. > > > 51c70 < rule entrypoint = --- > rule entrypoint [arg_1... arg_n] = 55c74 < and entrypoint = --- > and entrypoint [arg_1... arg_n] = 60,61c79,81 < < Comments are delimited by (* and *), as in Caml. --- > Comments are delimited by (* and *), as in Caml. The parse keyword, can be > replaced by the shortest keyword, with the semantic consequences explained > below. 78,79c98,99 < In following regular expressions, the identifier ident can be used as shorthand < for regexp. --- > In regular expressions that follow this declaration, the identifier ident can > be used as shorthand for regexp. 86,93c106,114 < (starting with a lowercase letter). Each entry point becomes a Caml function < that takes one argument of type Lexing.lexbuf. Characters are read from the < Lexing.lexbuf argument and matched against the regular expressions provided in < the rule, until a prefix of the input matches one of the rule. The < corresponding action is then evaluated and returned as the result of the < function. < If several regular expressions match a prefix of the input, the ``longest < match'' rule applies: the regular expression that matches the longest prefix of --- > (starting with a lowercase letter). Similarily, the arguments arg_1... arg_n > must be valid identifiers for Caml. Each entry point becomes a Caml function > that takes n+1 arguments, the extra implicit last argument being of type > Lexing.lexbuf. Characters are read from the Lexing.lexbuf argument and matched > against the regular expressions provided in the rule, until a prefix of the > input matches one of the rule. The corresponding action is then evaluated and > returned as the result of the function. > If several regular expressions match a prefix of the input, the "longest > match" rule applies: the regular expression that matches the longest prefix of 95a117,122 > However, if lexer rules are introduced with the shortest keyword in place of > the parse keyword, then the "shortest match" rule applies: the shortest prefix > of the input is selected. In case of tie, the regular expression that occurs > earlier in the rule is still selected. This feature is not intended for use in > ordinary lexical analyzers, it may facilitate the use of ocamllex as a simple > text processing tool. 102a130 > regexp ::= ... 105,106c133,135 < ' char ' A character constant, with the same syntax as Objective Caml < character constants. Match the denoted character. --- > > ' regular-char | escape-sequence ' A character constant, with the same > syntax as Objective Caml character constants. Match the denoted character. 115,116c144,145 < " string " A string constant, with the same syntax as Objective Caml string < constants. Match the corresponding sequence of characters. --- > " { string-character } " A string constant, with the same syntax as Objective > Caml string constants. Match the corresponding sequence of characters. 126a156,160 > regexp_1 # regexp_2 (Difference of character sets). Regular expressions > regexp_1 and regexp_2 must be character sets defined with [... ] (or a a > single character expression or underscore _). Match the difference of the > two specified character sets. > 146a181,182 > regexp as ident Bind the substring matched by regexp to identifier ident. > 148c184 < followed by ?, then concatenation, then | (alternation). --- > followed by ?, then concatenation, then | (alternation), then as. 155,157c191,194 < where the identifier lexbuf is bound to the current lexer buffer. Some typical < uses for lexbuf, in conjunction with the operations on lexer buffers provided < by the Lexing standard library module, are listed below. --- > where the identifiers defined by using the as construct are bound to subparts > of the matched string. Additionally, lexbuf is bound to the current lexer > buffer. Some typical uses for lexbuf, in conjunction with the operations on > lexer buffers provided by the Lexing standard library module, are listed below. 173,175c210,214 < entrypoint lexbuf (Where entrypoint is the name of another entry point in the < same lexer definition.) Recursively call the lexer on the given entry point. < Useful for lexing nested comments, for example. --- > entrypoint [exp_1... exp_n] lexbuf (Where entrypoint is the name of another > entry point in the same lexer definition.) Recursively call the lexer on the > given entry point. Notice that lexbuf is the last argument. Useful for > lexing nested comments, for example. > 177a217,245 > 12.2.6 Variables in regular expressions > ======================================== > The as construct is similar to "groups" as provided by numerous regular > expression packages. The type of these variables can be string, char, string > option or char option. > We first consider the case of linear patterns, that is the case when all as > bound variables are distinct. In regexp as ident, the type of ident normally > is string (or string option) except when regexp is a character constant, an > underscore, a string constant of length one, a character set specification, or > an alternation of those. Then, the type of ident is char (or char option). > Option types are introduced when overall rule matching does not imply matching > of the bound sub-pattern. This is in particular the case of ( regexp as ident > ) ? and of regexp_1 | ( regexp_2 as ident ). > There is no linearity restriction over as bound variables. When a variable is > bound more than once, the previous rules are to be extended as follows: > > - A variable is a char variable when all its occurrences bind char occurrences > in the previous sense. > - A variable is an option variable when the overall expression can be matched > without binding this variable. > For instance, in `('a' as x) | ( 'a' (_ as x) )' the variable `x' is of type > char, whereas in `("ab" as x) | ( 'a' (_ as x) ? )' the variable `x' is of > type string option. > In some cases, a sucessful match may not yield a unique set of bindings. For > instance the matching of `aba' by the regular expression `(('a'|"ab") as x) > (("ba"|'a') as y)' may result in binding either `x' to `"ab"' and `y' to `"a"', > or `x' to `"a"' and `y' to `"ba"'. The automata produced ocamllex on such > ambiguous regular expressions will select one of the possible resulting sets of > bindings. The selected set of bindings is purposely left unspecified. 179c247,248 < 12.2.6 Reserved identifiers --- > > 12.2.7 Reserved identifiers 225,227c294,296 < Comments are enclosed between `/*' and `*/' (as in C) in the ``declarations'' < and ``rules'' sections, and between `(*' and `*)' (as in Caml) in the < ``header'' and ``trailer'' sections. --- > Comments are enclosed between `/*' and `*/' (as in C) in the "declarations" > and "rules" sections, and between `(*' and `*)' (as in Caml) in the "header" > and "trailer" sections. 246,258c315,329 < %token symbol ... symbol Declare the given symbols as tokens (terminal < symbols). These symbols are added as constant constructors for the token < concrete type. < < %token < type > symbol ... symbol Declare the given symbols as tokens with an < attached attribute of the given type. These symbols are added as < constructors with arguments of the given type for the token concrete type. < The type part is an arbitrary Caml type expression, except that all type < constructor names must be fully qualified (e.g. Modname.typename) for all < types except standard built-in types, even if the proper `open' directives < (e.g. `open Modname') were given in the header section. That's because the < header is copied only to the .ml output file, but not to the .mli output < file, while the type part of a `%token' declaration is copied to both. --- > > %token constr ... constr Declare the given symbols constr ... constr as > tokens (terminal symbols). These symbols are added as constant constructors > for the token concrete type. > > %token < typexpr > constr ... constr Declare the given symbols constr ... > constr as tokens with an attached attribute of the given type. These symbols > are added as constructors with arguments of the given type for the token > concrete type. The typexpr part is an arbitrary Caml type expression, except > that all type constructor names must be fully qualified (e.g. > Modname.typename) for all types except standard built-in types, even if the > proper `open' directives (e.g. `open Modname') were given in the header > section. That's because the header is copied only to the .ml output file, > but not to the .mli output file, while the typexpr part of a `%token' > declaration is copied to both. 266,272c337,343 < %type < type > symbol ... symbol Specify the type of the semantic attributes < for the given symbols. This is mandatory for start symbols only. Other < nonterminal symbols need not be given types by hand: these types will be < inferred when running the output files through the Objective Caml compiler < (unless the `-s' option is in effect). The type part is an arbitrary Caml < type expression, except that all type constructor names must be fully < qualified, as explained above for %token. --- > %type < typexpr > symbol ... symbol Specify the type of the semantic > attributes for the given symbols. This is mandatory for start symbols only. > Other nonterminal symbols need not be given types by hand: these types will > be inferred when running the output files through the Objective Caml > compiler (unless the `-s' option is in effect). The typexpr part is an > arbitrary Caml type expression, except that all type constructor names must > be fully qualified, as explained above for %token. 284a356,375 > The precedence declarations are used in the following way to resolve > reduce/reduce and shift/reduce conflicts: > > - Tokens and rules have precedences. By default, the precedence of a rule > is the precedence of its rightmost terminal. You can override this > default by using the %prec directive in the rule. > - A reduce/reduce conflict is resolved in favor of the first rule (in the > order given by the source file), and ocamlyacc outputs a warning. > - A shift/reduce conflict is resolved by comparing the precedence of the > rule to be reduced with the precedence of the token to be shifted. If the > precedence of the rule is higher, then the rule will be reduced; if the > precedence of the token is higher, then the token will be shifted. > - A shift/reduce conflict between a rule and a token with the same > precedence will be resolved using the associativity: if the token is > left-associative, then the parser will reduce; if the token is > right-associative, then the parser will shift. If the token is > non-associative, then the parser will declare a syntax error. > - When a shift/reduce conflict cannot be resolved using the above method, > then ocamlyacc will output a warning and the parser will always shift. > 341a433,436 > > -bprefix Name the output files prefix.ml, prefix.mli, prefix.output, instead > of the default naming convention. > 346,347c441 < -bprefix Name the output files prefix.ml, prefix.mli, prefix.output, instead < of the default naming convention. --- > -version Print version and exit. 364,365c458 < << < /* File parser.mly */ --- > << /* File parser.mly */ 390,391c483 < << < (* File lexer.mll *) --- > << (* File lexer.mll *) 399c491 < | ['0'-'9']+ { INT(int_of_string(Lexing.lexeme lexbuf)) } --- > | ['0'-'9']+ as lxm { INT(int_of_string lxm) } 409,410c501 < << < (* File calc.ml *) --- > << (* File calc.ml *) 422,423c513 < << < ocamllex lexer.mll # generates lexer.ml --- > << ocamllex lexer.mll # generates lexer.ml 439a530 > 446,447c537 < << < rule token = parse --- > <<rule token = parse 452,453c542,543 < | ['A'-'Z' 'a'-'z'] ['A'-'Z' 'a'-'z' '0'-'9' '_'] * < { IDENT(Lexing.lexeme lexbuf) } --- > | ['A'-'Z' 'a'-'z'] ['A'-'Z' 'a'-'z' '0'-'9' '_'] * as id > { IDENT id} 456c546 < general ``identifier'' rule, followed by a hashtable lookup to separate --- > general "identifier" rule, followed by a hashtable lookup to separate 458,459c548 < << < { let keyword_table = Hashtbl.create 53 --- > <<{ let keyword_table = Hashtbl.create 53 467,470c556,558 < ['A'-'Z' 'a'-'z'] ['A'-'Z' 'a'-'z' '0'-'9' '_'] * < { let id = Lexing.lexeme lexbuf in < try < Hashtbl.find keyword_table s --- > ['A'-'Z' 'a'-'z'] ['A'-'Z' 'a'-'z' '0'-'9' '_'] * as id > { try > Hashtbl.find keyword_table id 472c560 < IDENT s } --- > IDENT id } 475a564,568 > ocamllex: Position memory overflow, too many bindings The deterministic > automata generated by ocamllex maintains a table of positions inside the > scanned lexer buffer. The size of this table is limited to at most 255 > cells. This error should not show up in normal situations. > |