123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398(******************************************************************************)(* *)(* Menhir *)(* *)(* Copyright Inria. All rights reserved. This file is distributed under *)(* the terms of the GNU Library General Public License version 2, with a *)(* special exception on linking, as described in the file LICENSE. *)(* *)(******************************************************************************)(**This module defines several types and module types that are used in the
specification of the module {!Engine}. *)(* --------------------------------------------------------------------------- *)(* It would be nice if we could keep the structure of stacks and environments
hidden. However, stacks and environments must be accessible to semantic
actions, so the following data structure definitions must be public. *)(* --------------------------------------------------------------------------- *)(**A stack is a linked list of cells. A sentinel cell -- which is its own
successor -- is used to mark the bottom of the stack. The sentinel cell
itself is not significant -- it contains dummy values. *)type('state,'semantic_value)stack={state:'state;(**The state that we should go back to if we pop this stack cell.
This convention means that the state contained in the top stack cell is
not the current state [env.current]. It also means that the state found
within the sentinel is a dummy -- it is never consulted. This convention
is the same as that adopted by the code-based back-end. *)semv:'semantic_value;(**The semantic value associated with the chunk of input that this cell
represents. *)startp:Lexing.position;(**The start position of the chunk of input that this cell represents. *)endp:Lexing.position;(**The end position of the chunk of input that this cell represents. *)next:('state,'semantic_value)stack;(**The next cell down in the stack. If this is a self-pointer, then this
cell is the sentinel, and the stack is conceptually empty. *)}(* --------------------------------------------------------------------------- *)(**A parsing environment contains all of the parser's state (except for the
current program point). *)type('state,'semantic_value,'token)env={error:bool;(**If this flag is true, then the first component of [env.triple] should
be ignored, as it has been logically overwritten with the [error]
pseudo-token. *)triple:'token*Lexing.position*Lexing.position;(**The last token that was obtained from the lexer, together with its start
and end positions. Warning: before the first call to the lexer has taken
place, a dummy (and possibly invalid) token is stored here. *)stack:('state,'semantic_value)stack;(**The stack. *)current:'state;(**The current state. *)}(* --------------------------------------------------------------------------- *)(**A number of logging hooks are used to (optionally) emit logging messages. *)moduletypeLOG=sigtypestatetypeterminaltypeproduction(* The comments below indicate the conventional messages that correspond to
these hooks. *)(* State %d: *)valstate:state->unit(* Shifting (<terminal>) to state <state> *)valshift:terminal->state->unit(* Reducing a production should be logged either as a reduction
event (for regular productions) or as an acceptance event (for
start productions). *)(* Reducing production <production> / Accepting *)valreduce_or_accept:production->unit(* Lookahead token is now <terminal> (<pos>-<pos>) *)vallookahead_token:terminal->Lexing.position->Lexing.position->unit(* Initiating error handling *)valinitiating_error_handling:unit->unit(* Resuming error handling *)valresuming_error_handling:unit->unit(* Handling error in state <state> *)valhandling_error:state->unitend(* --------------------------------------------------------------------------- *)(**This signature describes the parameters that must be supplied to the LR
engine. *)moduletypeTABLE=sig(**The type of automaton states. *)typestate(**States are numbered. *)valnumber:state->int(**The type of tokens. These can be thought of as real tokens, that is,
tokens returned by the lexer. They carry a semantic value. This type
does not include the [error] pseudo-token. *)typetoken(**The type of terminal symbols. These can be thought of as integer codes.
They do not carry a semantic value. This type does include the [error]
pseudo-token. *)typeterminal(**The type of nonterminal symbols. *)typenonterminal(**The type of semantic values. *)typesemantic_value(**A token is conceptually a pair of a (non-[error]) terminal symbol and a
semantic value. The function [token2terminal] is the first the pair
projection. *)valtoken2terminal:token->terminal(**A token is conceptually a pair of a (non-[error]) terminal symbol and a
semantic value. The function [token2value] is the second the pair
projection. *)valtoken2value:token->semantic_value(* Even though the [error] pseudo-token is not a real token, it is a
terminal symbol. Furthermore, for regularity, it must have a semantic
value. *)(**The terminal symbol associated with the [error] token. *)valerror_terminal:terminal(**The semantic value associated with the [error] token. *)valerror_value:semantic_value(**[foreach_terminal] iterates over all terminal symbols. *)valforeach_terminal:(terminal->'a->'a)->'a->'a(**The type of productions. *)typeproduction(**[production_index] maps a production to its integer index. *)valproduction_index:production->int(**[find_production] maps a production index to a production.
Its argument must be a valid index; use with care. *)valfind_production:int->production(**If a state [s] has a default reduction on production [prod], then, upon
entering [s], the automaton should reduce [prod] without consulting the
lookahead token.
[default_reduction s] determines whether the state [s] has a default
reduction. Instead of returning a value of a sum type -- say, either
[DefRed prod] or [NoDefRed] -- it accepts two continuations, and invokes
just one of them. *)valdefault_reduction:state->('env->production->'answer)->('env->'answer)->'env->'answer(**An LR automaton can normally take three kinds of actions: shift, reduce,
or fail. (Acceptance is a particular case of reduction: it consists in
reducing a start production.)
There are two variants of the shift action. [shift/discard s] instructs
the automaton to discard the current token, request a new one from the
lexer, and move to state [s]. [shift/nodiscard s] instructs it to move to
state [s] without requesting a new token. This instruction should be used
when [s] has a default reduction on [#].
The function [action] provides access to the automaton's action table. It
maps a pair of a state and a terminal symbol to an action.
Instead of returning a value of a sum type -- one of shift/discard,
shift/nodiscard, reduce, or fail -- this function accepts three
continuations, and invokes just one them.
The parameters of the function [action] are as follows:
- the first two parameters, a state and a terminal symbol, are used to
look up the action table;
- the next parameter is the semantic value associated with the above
terminal symbol; it is not used, only passed along to the shift
continuation, as explained below;
- the shift continuation expects an environment; a flag that tells
whether to discard the current token; the terminal symbol that
is being shifted; its semantic value; and the target state of
the transition;
- the reduce continuation expects an environment and a production;
- the fail continuation expects an environment;
- the last parameter is the environment; it is not used, only passed
along to the selected continuation. *)valaction:state->terminal->semantic_value->('env->bool->terminal->semantic_value->state->'answer)->('env->production->'answer)->('env->'answer)->'env->'answer(**[maybe_shift_t s t] determines whether there exists a transition out of
the state [s], labeled with the terminal symbol [t], to some state
[s']. If so, it returns [Some s']. Otherwise, it returns [None]. *)valmaybe_shift_t:state->terminal->stateoption(**[may_reduce_prod s t prod] determines whether in the state [s], with
lookahead symbol [t], the automaton reduces production [prod]. This test
accounts for the possible existence of a default reduction. *)valmay_reduce_prod:state->terminal->production->bool(**The function [goto_nt] provides access to the automaton's goto table. It
maps a pair of a state [s] and a nonterminal symbol [nt] to a state. The
function call [goto_nt s nt] is permitted ONLY if the state [s] has an
outgoing transition labeled [nt]. Otherwise, its result is undefined. *)valgoto_nt:state->nonterminal->state(**The function [goto_prod] also provides access to the goto table. It maps
a pair of a production [prod] and a state [s] to a state. The call
[goto_prod prod s] is permitted ONLY if the state [s] has an outgoing
transition labeled with the nonterminal symbol [lhs prod]. *)valgoto_prod:state->production->state(**The function [maybe_goto_nt] serves the same purpose as [goto_nt].
Compared to [goto_nt], it involves an additional dynamic check, so it CAN
be called even the state [s] has no outgoing transition labeled [nt]. *)valmaybe_goto_nt:state->nonterminal->stateoption(**[lhs prod] returns the left-hand side of production [prod],
a nonterminal symbol. *)vallhs:production->nonterminal(**[is_start prod] tells whether the production [prod] is a start
production. *)valis_start:production->bool(**A semantic action can raise the exception [Error]. *)exceptionError(**By convention, a semantic action is responsible for:
1. fetching whatever semantic values and positions it needs off the stack;
2. popping an appropriate number of cells off the stack, as dictated
by the length of the right-hand side of the production;
3. computing a new semantic value, as well as new start and end positions;
4. pushing a new stack cell, which contains the three values
computed in step 3;
5. returning the new stack computed in steps 2 and 4. *)typesemantic_action=(state,semantic_value,token)env->(state,semantic_value)stack(* Point 1 above is essentially forced upon us: if semantic values were
fetched off the stack by this interpreter, then the calling convention
for semantic actions would be variadic: not all semantic actions would
have the same number of arguments. The rest follows rather naturally. *)(**The function [semantic_action] maps a production to its semantic action. *)valsemantic_action:production->semantic_action(**[may_reduce state prod] tests whether the state [state] is capable of
reducing the production [prod]. This function is currently costly and
is not used by the core LR engine. It is used in the implementation
of certain functions, such as [force_reduction], which allow the engine
to be driven programmatically. *)valmay_reduce:state->production->bool(**If the flag [log] is false, then the logging functions are not called.
If it is [true], then they are called. *)vallog:bool(**The logging hooks required by the LR engine. *)moduleLog:LOGwithtypestate:=stateandtypeterminal:=terminalandtypeproduction:=productionend(* --------------------------------------------------------------------------- *)(**This signature describes the monolithic (traditional) LR engine. When the
engine is used in this mode, the parser controls the lexer. *)moduletypeMONOLITHIC_ENGINE=sigtypestatetypetokentypesemantic_valueexceptionError(**An entry point to the engine requires a start state, a lexer, and a
lexing buffer. It either succeeds and produces a semantic value, or fails
and raises {!Error}. *)valentry:(* strategy: *)[`Legacy|`Simplified]->(* see [IncrementalEngine] *)state->(Lexing.lexbuf->token)->Lexing.lexbuf->semantic_valueend(* --------------------------------------------------------------------------- *)(**This signature describes just the entry point of the incremental LR engine.
It is a supplement to {!IncrementalEngine.INCREMENTAL_ENGINE}.
The [start] function is set apart because we do not wish to publish it as
part of the generated file [parser.mli]. Instead, the table back-end will
publish specialized versions of it, with a suitable type cast. *)moduletypeINCREMENTAL_ENGINE_START=sigtypestatetypesemantic_valuetype'acheckpoint(**[start] is an entry point. It requires a start state and a start position
and begins the parsing process. If the lexer is based on an OCaml lexing
buffer, the start position should be [lexbuf.lex_curr_p]. [start] produces
a checkpoint, which usually will be an [InputNeeded] checkpoint. (It could
be [Accepted] if this starting state accepts only the empty word. It could
be [Rejected] if this starting state accepts no word at all.) It does not
raise any exception.
[start s pos] should really produce a checkpoint of type ['a checkpoint],
for a fixed ['a] that depends on the state [s]. We cannot express this, so
we use [semantic_value checkpoint], which is safe. The table back-end uses
[Obj.magic] to produce safe specialized versions of [start]. *)valstart:state->Lexing.position->semantic_valuecheckpointend(* --------------------------------------------------------------------------- *)(**This signature describes the LR engine, which combines the monolithic
and incremental interfaces. *)moduletypeENGINE=sigincludeMONOLITHIC_ENGINEincludeIncrementalEngine.INCREMENTAL_ENGINEwithtypetoken:=tokenandtype'alr1state=state(* useful for us; hidden from the end user *)includeINCREMENTAL_ENGINE_STARTwithtypestate:=stateandtypesemantic_value:=semantic_valueandtype'acheckpoint:='acheckpointend