diff6-10.chapter12.txt

このページは最後に更新されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

last mod. 2008-08-28 (木) 09:53:44

diff6-10

13,15c13,15
< and yacc. Readers unfamiliar with lex and yacc are referred to ``Compilers:
< principles, techniques, and tools'' by Aho, Sethi and Ullman (Addison-Wesley,
< 1986), or ``Lex & Yacc'', by Levine, Mason and Brown (O'Reilly, 1992).
---
> and yacc. Readers unfamiliar with lex and yacc are referred to "Compilers:
> principles, techniques, and tools" by Aho, Sethi and Ullman (Addison-Wesley,
> 1986), or "Lex & Yacc", by Levine, Mason and Brown (O'Reilly, 1992).
42a43,61
> 12.1.1  Options
> ===============
>    The following command-line options are recognized by ocamllex.
>   
>   
>  -o output-file  Specify the name of the output file produced by ocamllex.
>    Default is lexer.ml, ocamllex being invoked as ocamllex lexer.mll.
>  
>  -ml  Output code that does not use the Caml built-in automata interpreter.
>    Instead, the automaton is encoded by Caml functions. This option is useful
>    for debugging ocamllex, using it for production lexers is not recommended.
>  
>  -q  Quiet mode. ocamllex normally outputs informational messages to standard
>    output. They are suppressed if option -q is used.
>  
>  -version   Print version and exit. 
>   
>   
> 
51c70
<   rule entrypoint =
---
>   rule entrypoint [arg_1... arg_n] =
55c74
<   and entrypoint =
---
>   and entrypoint [arg_1... arg_n] =
60,61c79,81
<   
<   Comments are delimited by (* and *), as in Caml.
---
>    Comments are delimited by (* and *), as in Caml. The parse keyword, can be
> replaced by the shortest keyword, with the semantic consequences explained
> below.
78,79c98,99
< In following regular expressions, the identifier ident can be used as shorthand
< for regexp.
---
> In regular expressions that follow this declaration, the identifier ident can
> be used as shorthand for regexp.
86,93c106,114
< (starting with a lowercase letter). Each entry point becomes a Caml function
< that takes one argument of type Lexing.lexbuf. Characters are read from the
< Lexing.lexbuf argument and matched against the regular expressions provided in
< the rule, until a prefix of the input matches one of the rule. The
< corresponding action is then evaluated and returned as the result of the
< function.
<   If several regular expressions match a prefix of the input, the ``longest
< match'' rule applies: the regular expression that matches the longest prefix of
---
> (starting with a lowercase letter). Similarily, the arguments arg_1... arg_n
> must be valid identifiers for Caml. Each entry point becomes a Caml function
> that takes n+1 arguments, the extra implicit last argument being of type
> Lexing.lexbuf. Characters are read from the Lexing.lexbuf argument and matched
> against the regular expressions provided in the rule, until a prefix of the
> input matches one of the rule. The corresponding action is then evaluated and
> returned as the result of the function.
>   If several regular expressions match a prefix of the input, the "longest
> match" rule applies: the regular expression that matches the longest prefix of
95a117,122
>   However, if lexer rules are introduced with the shortest keyword in place of
> the parse keyword, then the "shortest match" rule applies: the shortest prefix
> of the input is selected. In case of tie, the regular expression that occurs
> earlier in the rule is still selected. This feature is not intended for use in
> ordinary lexical analyzers, it may facilitate the use of ocamllex as a simple
> text processing tool.
102a130
>                                 regexp ::=  ... 
105,106c133,135
<  ' char '  A character constant, with the same syntax as Objective Caml
<    character constants. Match the denoted character.
---
>  
>  ' regular-char |  escape-sequence '  A character constant, with the same
>    syntax as Objective Caml character constants. Match the denoted character.
115,116c144,145
<  " string "  A string constant, with the same syntax as Objective Caml string
<    constants. Match the corresponding sequence of characters.
---
>  " { string-character } "  A string constant, with the same syntax as Objective
>    Caml string constants. Match the corresponding sequence of characters.
126a156,160
>  regexp_1 #  regexp_2  (Difference of character sets). Regular expressions
>    regexp_1 and regexp_2 must be character sets defined with [... ] (or a a
>    single character expression or underscore _). Match the difference of the
>    two specified character sets.
>  
146a181,182
>  regexp as  ident  Bind the substring matched by regexp to identifier ident. 
>   
148c184
< followed by ?, then concatenation, then | (alternation).
---
> followed by ?, then concatenation, then | (alternation), then as.
155,157c191,194
< where the identifier lexbuf is bound to the current lexer buffer. Some typical
< uses for lexbuf, in conjunction with the operations on lexer buffers provided
< by the Lexing standard library module, are listed below.
---
> where the identifiers defined by using the as construct are bound to subparts
> of the matched string. Additionally, lexbuf is bound to the current lexer
> buffer. Some typical uses for lexbuf, in conjunction with the operations on
> lexer buffers provided by the Lexing standard library module, are listed below.
173,175c210,214
<  entrypoint lexbuf  (Where entrypoint is the name of another entry point in the
<    same lexer definition.) Recursively call the lexer on the given entry point.
<    Useful for lexing nested comments, for example.
---
>  entrypoint [exp_1... exp_n] lexbuf  (Where entrypoint is the name of another
>    entry point in the same lexer definition.) Recursively call the lexer on the
>    given entry point. Notice that lexbuf is the last argument. Useful for
>    lexing nested comments, for example.
>   
177a217,245
> 12.2.6  Variables in regular expressions
> ========================================
>    The as construct is similar to "groups" as provided by numerous regular
> expression packages. The type of these variables can be string, char, string
> option or char option.
>   We first consider the case of linear patterns, that is the case when all as
> bound variables are distinct. In regexp as  ident, the type of ident normally
> is string (or string option) except when regexp is a character constant, an
> underscore, a string constant of length one, a character set specification, or
> an alternation of those. Then, the type of ident is char (or char option).
> Option types are introduced when overall rule matching does not imply matching
> of the bound sub-pattern. This is in particular the case of ( regexp as  ident
> ) ? and of regexp_1 | (  regexp_2 as  ident ).
>   There is no linearity restriction over as bound variables. When a variable is
> bound more than once, the previous rules are to be extended as follows: 
>   
>  - A variable is a char variable when all its occurrences bind char occurrences
>    in the previous sense. 
>  - A variable is an option variable when the overall expression can be matched
>    without binding this variable. 
>    For instance, in `('a' as x) | ( 'a' (_ as x) )' the variable `x' is of type
> char, whereas in  `("ab" as x) | ( 'a' (_ as x) ? )' the variable `x' is of
> type string option.
>   In some cases, a sucessful match may not yield a unique set of bindings. For
> instance the matching of `aba' by the regular expression `(('a'|"ab") as x)
> (("ba"|'a') as y)' may result in binding either `x' to `"ab"' and `y' to `"a"',
> or `x' to `"a"' and `y' to `"ba"'. The automata produced ocamllex on such
> ambiguous regular expressions will select one of the possible resulting sets of
> bindings. The selected set of bindings is purposely left unspecified.
179c247,248
< 12.2.6  Reserved identifiers
---
> 
> 12.2.7  Reserved identifiers
225,227c294,296
<   Comments are enclosed between `/*' and `*/' (as in C) in the ``declarations''
< and ``rules'' sections, and between `(*' and `*)' (as in Caml) in the
< ``header'' and ``trailer'' sections.
---
>   Comments are enclosed between `/*' and `*/' (as in C) in the "declarations"
> and "rules" sections, and between `(*' and `*)' (as in Caml) in the "header"
> and "trailer" sections.
246,258c315,329
<  %token symbol ... symbol  Declare the given symbols as tokens (terminal
<    symbols). These symbols are added as constant constructors for the token
<    concrete type.
<  
<  %token < type > symbol ... symbol  Declare the given symbols as tokens with an
<    attached attribute of the given type. These symbols are added as
<    constructors with arguments of the given type for the token concrete type.
<    The type part is an arbitrary Caml type expression, except that all type
<    constructor names must be fully qualified (e.g. Modname.typename) for all
<    types except standard built-in types, even if the proper `open' directives
<    (e.g. `open Modname') were given in the header section. That's because the
<    header is copied only to the .ml output file, but not to the .mli output
<    file, while the type part of a `%token' declaration is copied to both.
---
>  
>  %token constr ...  constr  Declare the given symbols constr ...  constr as
>    tokens (terminal symbols). These symbols are added as constant constructors
>    for the token concrete type.
>  
>  %token < typexpr >  constr ...  constr  Declare the given symbols constr ... 
>    constr as tokens with an attached attribute of the given type. These symbols
>    are added as constructors with arguments of the given type for the token
>    concrete type. The typexpr part is an arbitrary Caml type expression, except
>    that all type constructor names must be fully qualified (e.g.
>    Modname.typename) for all types except standard built-in types, even if the
>    proper `open' directives (e.g. `open Modname') were given in the header
>    section. That's because the header is copied only to the .ml output file,
>    but not to the .mli output file, while the typexpr part of a `%token'
>    declaration is copied to both.
266,272c337,343
<  %type < type > symbol ... symbol  Specify the type of the semantic attributes
<    for the given symbols. This is mandatory for start symbols only. Other
<    nonterminal symbols need not be given types by hand: these types will be
<    inferred when running the output files through the Objective Caml compiler
<    (unless the `-s' option is in effect). The type part is an arbitrary Caml
<    type expression, except that all type constructor names must be fully
<    qualified, as explained above for %token.
---
>  %type < typexpr >  symbol ...  symbol  Specify the type of the semantic
>    attributes for the given symbols. This is mandatory for start symbols only.
>    Other nonterminal symbols need not be given types by hand: these types will
>    be inferred when running the output files through the Objective Caml
>    compiler (unless the `-s' option is in effect). The typexpr part is an
>    arbitrary Caml type expression, except that all type constructor names must
>    be fully qualified, as explained above for %token.
284a356,375
>  The precedence declarations are used in the following way to resolve
>    reduce/reduce and shift/reduce conflicts: 
>      
>     - Tokens and rules have precedences. By default, the precedence of a rule
>       is the precedence of its rightmost terminal. You can override this
>       default by using the %prec directive in the rule. 
>     - A reduce/reduce conflict is resolved in favor of the first rule (in the
>       order given by the source file), and ocamlyacc outputs a warning. 
>     - A shift/reduce conflict is resolved by comparing the precedence of the
>       rule to be reduced with the precedence of the token to be shifted. If the
>       precedence of the rule is higher, then the rule will be reduced; if the
>       precedence of the token is higher, then the token will be shifted. 
>     - A shift/reduce conflict between a rule and a token with the same
>       precedence will be resolved using the associativity: if the token is
>       left-associative, then the parser will reduce; if the token is
>       right-associative, then the parser will shift. If the token is
>       non-associative, then the parser will declare a syntax error. 
>     - When a shift/reduce conflict cannot be resolved using the above method,
>       then ocamlyacc will output a warning and the parser will always shift. 
>  
341a433,436
>  
>  -bprefix  Name the output files prefix.ml, prefix.mli, prefix.output, instead
>    of the default naming convention.
>  
346,347c441
<  -bprefix  Name the output files prefix.ml, prefix.mli, prefix.output, instead
<    of the default naming convention.
---
>  -version   Print version and exit.
364,365c458
< <<
<           /* File parser.mly */
---
> <<        /* File parser.mly */
390,391c483
< <<
<           (* File lexer.mll *)
---
> <<        (* File lexer.mll *)
399c491
<             | ['0'-'9']+     { INT(int_of_string(Lexing.lexeme lexbuf)) }
---
>             | ['0'-'9']+ as lxm { INT(int_of_string lxm) }
409,410c501
< <<
<           (* File calc.ml *)
---
> <<        (* File calc.ml *)
422,423c513
< <<
<           ocamllex lexer.mll       # generates lexer.ml
---
> <<        ocamllex lexer.mll       # generates lexer.ml
439a530
>  
446,447c537
<    <<
<      rule token = parse
---
>    <<rule token = parse
452,453c542,543
<      | ['A'-'Z' 'a'-'z'] ['A'-'Z' 'a'-'z' '0'-'9' '_'] *
<                     { IDENT(Lexing.lexeme lexbuf) }
---
>      | ['A'-'Z' 'a'-'z'] ['A'-'Z' 'a'-'z' '0'-'9' '_'] * as id
>                     { IDENT id}
456c546
<    general ``identifier'' rule, followed by a hashtable lookup to separate
---
>    general "identifier" rule, followed by a hashtable lookup to separate
458,459c548
<    <<
<      { let keyword_table = Hashtbl.create 53
---
>    <<{ let keyword_table = Hashtbl.create 53
467,470c556,558
<        ['A'-'Z' 'a'-'z'] ['A'-'Z' 'a'-'z' '0'-'9' '_'] *
<                     { let id = Lexing.lexeme lexbuf in
<                       try
<                         Hashtbl.find keyword_table s
---
>        ['A'-'Z' 'a'-'z'] ['A'-'Z' 'a'-'z' '0'-'9' '_'] * as id
>                     { try
>                         Hashtbl.find keyword_table id
472c560
<                         IDENT s }
---
>                         IDENT id }
475a564,568
>  ocamllex: Position memory overflow, too many bindings  The deterministic
>    automata generated by ocamllex maintains a table of positions inside the
>    scanned lexer buffer. The size of this table is limited to at most 255
>    cells. This error should not show up in normal situations.
>

新規 編集 添付