123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381moduletypeCHARACTER_PARSER=sigincludeInterfaces.FULL_PARSERwithtypeexpect=string*Indent.expectationoption(** @inline *)(** {1 Position Information} *)valposition:t->Position.t(** [position p] The current position in the input stream.
Can be called at any time.
*)valline:t->int(** [line p] The current line in the input stream.
Can be called at any time.
*)valcolumn:t->int(** [column p] The current column in the input stream.
Can be called at any time.
*)valbyte_column:t->int(** [byte_column p] The current byte_column in the input stream.
Can be called at any time.
*)(** {1 Run the Parser on Streams} *)valrun_on_string:string->t->t(** [run_on_string str p] Run the parser [p] on the string [str]. *)valrun_on_string_at:int->string->t->int*t(** [run_on_string str start p] Run the parser [p] on the string [str]
starting at index [start] Return the parser and the index next to be
pushed in. *)valrun_on_channel:in_channel->t->t(** [run_on_channel ic p] Run the parser [p] on input channel [ic]. *)endmoduletypeEND_OF_INPUT_COMBINATOR=sigtype_tvalexpect_end:'a->'at(** [expect_end a] Expect the end of token stream.
In case of success return [a].
In case of failure return the syntax error with the expectation "end of
input".
{b CAUTION}: There is usually no need to use this combinator! This
combinator is needed only for partial parsers.
{b Never ever} backtrack over this combinator.
*)endmoduletypeBASE_64_COMBINATORS=sigtype_tvalbase64:(string->'r)->(string->'r->'r)->'rt(** [base64 start next] Parse a base64 encoding into an object of type ['r].
A base64 encoding is a sequence of zero or more base64 characters
(A-Za-z0-9+/) grouped into sequences of 4 characters and optionally
padded with the character [=]. Each group of 2-4 base64 characters are
decoded into a string of 1-3 bytes.
[start] gets the first 1-3 bytes and [next] gets all subsequent 1-3
bytes until the end of the encoding is reached.
*)valstring_of_base64:stringt(** Parse a base64 encoding and decode it into a string. *)endmoduletypeLEXER_COMBINATOR=sigtype_tvallexer:'at->'tok->'tokt->(Position.range*'tok)t(** [lexer whitespace end_token tok]
A lexer combinator.
- The [whitespace] combinator recognizes a possibly empty sequence of
whitespace (usually blanks, tabs, newlines, comments, ...).
- [end_token] is a token which the lexer returns when it has successfully
consumed the end of input.
- [tok] is a combinator recognizing tokens
(usually [tok1 </> tok2 </> ... </> tokn]).
The lexer combinator recognizes tokens in an input stream of the form
{v
WS Token WS Token .... WS EOF
v}
Note: If a combinator fails to recognize a token and having
consumed some input, then the subsequent combinators are not used
anymore as alternatives. Therefore if there are tokens which can begin
with the same prefix, then it is necessary to make the recognition of
the common prefixes backtrackable in all but the last combinator
recognizing a token with the same prefix. The same applies to whitespace
if part of the whitespace can begin like a token.
Examples:
- comment: "// ...."
- division operator: "/"
In this case the recognition at least of the first slash of the comment
has to be backtrackable.
*)endmoduletypeLOCATION_COMBINATORS=sigtype_tvallocated:'at->'aLocated.tt(** [located p] Parse [p] and return its result with its start and end
position.
Note: If [p] removes whitespace at the end, the returned end position is
at the end of the whitespace. This is not what you usually want.
Therefore first parse the essential part located and then remove the
whitespace.
*)valposition:Position.tt(** The current position in the file. *)endmoduletypeINDENTATION_COMBINATORS=sigtype_t(** The indentation of a normal construct is the indentation of its leftmost
token. The indentation of a vertically aligned construct is the
indentation of its first token.
*)valindent:int->'at->'at(** [indent i p] Indent [p] by [i] columns relative to its parent.
Precondition: [0 <= i]
The indentation of [p] is defined by the indentation of its first token.
The first token has to be indented at least [i] columns relative to the
parent of [p]. After the first token of [p] has been parsed
successfully, all subsequent tokens must have at least the same
indentation.
Note: Indentation of [p] relative to its parent only makes sense, if the
first token of [p] is not the first token of its parent! I.e. the parent
of [p] should have consumed at least one token before the parsing of [p]
starts.
*)(** CAUTION WITH ALIGNMENT !!
If you want to align a certain number of constructs vertically it is {e
mandatory} to indent the whole block of constructs. Do not indent the
individual items to be aligned. Indent the whole block.
Reason: The parent of the block usually has already consumed some token
and the indentation of a construct is the position of the leftmost
token. If you don't indent the aligned block, then it will be aligned
with the leftmost token of the parent construct. This is usually not
intended and a common pitfall. Any indentation e.g. zero indentation is
ok.
*)valalign:'at->'at(** [align p]
Use the start position of the first token of [p] to align it with other
constructs. If [p] does not consume any token, then [align p] has no
effect.
Alignment makes sense if there are at least two combinators which
are aligned and indented. E.g. suppose there are two combinators [p] and
[q]. Then we can form
{[
indent 1 (
let* a = align p in
let* b = align q in
return (a,b)
)
]}
This combinator parses [p] whose first token has to be indented at least
one column relative to its parent. And then it parses [q] whose first
token must be aligned with the first token of [p].
The indentation decouples the alignment of [p] and [q] with other
aligned siblings or parents. [indent 0 ...] can be used to make the
indentation optional.
*)valleft_align:'at->'at(** [left_align p]
Align a construct described by [p] at its leftmost possible column. If a
whole block of constructs have to be vertically left aligned, then it is
important that at least the first construct is left aligned. The
subsequent constructs will be aligned exactly vertically. For the
subsequent constructs [left_align] has the same effect as {!align}.
*)valdetach:'at->'at(** [detach p] Parse [p] without any indentation and alignment restrictions.
Detachment is needed to parse whitespace. The whitespace at the
beginning of a line never satisfies any nontrivial indentation or
aligment requirements.
*)endmoduletypeCHARACTER_COMBINATORS=sigtype_tvalcharp:(char->bool)->string->chart(** [charp p expect] Parse a character which satisfies the predicate [p].
In case of failure, report the failed expectation [expect].
*)valrange:char->char->chart(** [range c1 c2] Parses a character in the range between [c1] and [c2], i.e.
a character [c] which satisfies [c1 <= c && c <= c2].*)valchar:char->chart(** [char c] Parse the character [c]. *)valone_of_chars:string->string->chart(** [one_of_chars str expect]
Parse one of the characters in the string [str]. In case of failure,
report the failed expectation [expect].
*)valstring:string->stringt(** [string str] Parse the string [str]. *)valuppercase_letter:chart(** Parse an uppercase letter. *)vallowercase_letter:chart(** Parse a lowercase letter. *)valletter:chart(** Parse a letter. *)valdigit_char:chart(** Parse a digit [0..9] and return it as character. *)valdigit:intt(** Parse a digit and return it as number. *)valword:(char->bool)->(char->bool)->string->stringt(** [word first inner error]
Parse a word which starts with a character satisfying the predicate
[first] followed by zero or more characters satisfying the predicate
[inner]. In case of failure add the expectation [error].
*)valhex_uppercase:intt(** Equivalent to [range 'A' 'F'] and then converted to the corresponding
number between [10] and [15]. *)valhex_lowercase:intt(** Equivalent to [range 'a' 'f'] and then converted to the corresponding
number between [10] and [15]. *)valhex_digit:intt(** Parse a hexadecimal digit and return the corresponding number between
[0] and [15]. *)endmoduletypeMAKE_FINAL_COMBINATORS=sigtype_ttypestatetypefinaltypeparservalmake:state->finalt->parser(** [make state c]
Make a parser which starts in state [state] and parses a construct
defined by the combinator [c]. The token stream must be ended by
[put_end], otherwise the parse won't succeed.
{b CAUTION}: [c] must not be a combinator containing [expect_end].
Moreover it must not have been constructed by {!lexer}.
*)valmake_partial:Position.t->state->finalt->parser(** [make_partial pos state c]
Make parser which analyzes a part of the input stream.
The parser starts at position [pos] in state [state] and
parses a construct defined by the combinator [c]. The parser can succeed
even if no end token has been pushed into the parser.
*)end