123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296openCoremoduletypeS=sig(** These are OCaml bindings for Google's re2 library. Quoting from the re2 homepage:
{v
RE2 is a fast, safe, thread-friendly alternative to backtracking regular
expression engines like those used in PCRE, Perl, and Python. It is a C++ library.
Unlike most automata-based engines, RE2 implements almost all the common Perl and
PCRE features and syntactic sugars. It also finds the leftmost-first match, the
same match that Perl would, and can return submatch information. The one
significant exception is that RE2 drops support for backreferencesÂą and
generalized zero-width assertions, because they cannot be implemented
efficiently. The syntax page gives full details. v}
Syntax reference: {:https://github.com/google/re2/wiki/Syntax}
**)(** Although OCaml strings and C++ strings may legally have internal null bytes, this
library doesn't handle them correctly by doing conversions via C strings.
The failure mode is the search stops early, which isn't bad considering how rare
internal null bytes are in practice.
The strings are considered according to [Options.encoding] which is UTF-8 by
default (the alternative is ISO 8859-1).
*)(** {6 Basic Types} *)typet[@@derivingsexp_of]includeComparable.S_plainwithtypet:=tincludeHashable.S_plainwithtypet:=ttyperegex=t(** Subpatterns are referenced by name if labelled with the [/(?P<...>...)/] syntax, or
else by counting open-parens, with subpattern zero referring to the whole regex. *)typeid_t=[`Indexofint|`Nameofstring](** [index_of_id t id] resolves subpattern names and indices into indices. **)valindex_of_id_exn:t->id_t->int(** The [sub] keyword argument means, omit location information for subpatterns with
index greater than [sub].
Subpatterns are indexed by the number of opening parentheses preceding them:
[~sub:(`Index 0)] : only the whole match
[~sub:(`Index 1)] : the whole match and the first submatch, etc.
If you only care whether the pattern does match, you can request no location
information at all by passing [~sub:(`Index -1)].
With one exception, I quote from re2.h:443,
{v
Don't ask for more match information than you will use:
runs much faster with nmatch == 1 than nmatch > 1, and
runs even faster if nmatch == 0. v}
For [sub > 1], re2 executes in three steps:
1. run a DFA over the entire input to get the end of the whole match
2. run a DFA backward from the end position to get the start position
3. run an NFA from the match start to match end to extract submatches
[sub == 1] lets it stop after (2) and [sub == 0] lets it stop after (1).
(See re2.cc:692 or so.)
The one exception is for the functions [get_matches], [replace], and
[Iterator.next]: Since they must iterate correctly through the whole string, they
need at least the whole match (subpattern 0). These functions will silently rewrite
[~sub] to be non-negative.
*)moduleOptions=Optionsvalcreate:?options:Options.t->string->tOr_error.tvalcreate_exn:?options:Options.t->string->tincludeStringablewithtypet:=t(** [num_submatches t] returns 1 + the number of open-parens in the pattern.
N.B. [num_submatches t == 1 + RE2::NumberOfCapturingGroups()] because
[RE2::NumberOfCapturingGroups()] ignores the whole match ("subpattern zero").
*)valnum_submatches:t->int(** [pattern t] returns the pattern from which the regex was constructed. *)valpattern:t->stringvaloptions:t->Options.t(** [find_all t input] a convenience function that returns all non-overlapping
matches of [t] against [input], in left-to-right order.
If [sub] is given, and the requested subpattern did not capture, then no match is
returned at that position even if other parts of the regex did match. *)valfind_all:?sub:id_t->t->string->stringlistOr_error.tvalfind_all_exn:?sub:id_t->t->string->stringlist(** [find_first ?sub pattern input] finds the first match of [pattern] in [input], and
returns the subpattern specified by [sub], or an error if the subpattern didn't
capture. *)valfind_first:?sub:id_t->t->string->stringOr_error.tvalfind_first_exn:?sub:id_t->t->string->string(** [find_submatches t input] finds the first match and returns all submatches.
Element 0 is the whole match and element 1 is the first parenthesized submatch, etc.
*)valfind_submatches:t->string->stringoptionarrayOr_error.tvalfind_submatches_exn:t->string->stringoptionarray(** [matches pattern input] @return true iff [pattern] matches [input] *)valmatches:t->string->bool(** [split pattern input] @return [input] broken into pieces where [pattern]
matches. Subpatterns are ignored.
@param max (default: unlimited) split only at the leftmost [max] matches
@param include_matches (default: false) include the matched substrings in the
returned list (e.g., the regex [/[,()]/] on ["foo(bar,baz)"] gives [["foo"; "(";
"bar"; ","; "baz"; ")"]] instead of [["foo"; "bar"; "baz"]])
If [t] never matches, the returned list has [input] as its one element.
*)valsplit:?max:int->?include_matches:bool->t->string->stringlist(** [rewrite pattern ~template input] is a convenience function for [replace]:
Instead of requiring an arbitrary transformation as a function, it accepts a
template string with zero or more substrings of the form ["\\n"], each of
which will be replaced by submatch [n]. For every match of [pattern]
against [input], the template will be specialized and then substituted for
the matched substring. *)valrewrite:t->template:string->string->stringOr_error.tvalrewrite_exn:t->template:string->string->string(** [valid_rewrite_template pattern ~template] returns [true] iff [template] is a
valid rewrite template for [pattern] *)valvalid_rewrite_template:t->template:string->bool(** [escape nonregex] returns a copy of [nonregex] with everything escaped (i.e.,
if the return value were t to regex, it would match exactly the
original input) *)valescape:string->string(** {6 Infix Operators} *)moduleInfix:sig(** [input =~ pattern] an infix alias of [matches] *)val(=~):string->t->boolend(** {6 Complicated Interface} *)type'awithout_trailing_none[@@derivingsexp_of](** This type marks call sites affected by a bugfix that eliminated a trailing
None. When you add this wrapper, check that your call site does not still work
around the bug by dropping the last element. *)valwithout_trailing_none:'a->'awithout_trailing_nonemoduleMatch:sig(** A Match.t is the result of applying a regex to an input string *)typet[@@derivingsexp_of](** If location information has been omitted (e.g., via [~sub]), the error returned is
[Regex_no_such_subpattern], just as though that subpattern were never defined.
*)valget:sub:id_t->t->stringoptionvalget_exn:sub:id_t->t->string(** [get_all t] returns all available matches as strings in an array. For the
indexing convention, see comment above regarding [sub] parameter. *)valget_all:twithout_trailing_none->stringoptionarray(** [get_pos_exn ~sub t] returns the start offset and length in bytes. Note that for
variable-width encodings (e.g., UTF-8) this may not be the same as the character
offset and character length.
*)valget_pos_exn:sub:id_t->t->int*intend(** [get_matches pattern input] returns all non-overlapping matches of [pattern]
against [input]
@param max (default: unlimited) return only the leftmost [max] matches
@param sub (default: all) returned Match.t's will contain only the first [sub]
matches.
*)valget_matches:?sub:id_t->?max:int->t->string->Match.tlistOr_error.tvalget_matches_exn:?sub:id_t->?max:int->t->string->Match.tlistvalto_sequence_exn:?sub:id_t->t->string->Match.tSequence.t(** [first_match pattern input] @return the first match iff [pattern] matches [input] *)valfirst_match:t->string->Match.tOr_error.tvalfirst_match_exn:t->string->Match.t(** [replace ?sub ?max ~f pattern input] @return an edited copy of [input] with every
substring matched by [pattern] transformed by [f].
@param only (default: all) replace only the nth match
*)valreplace:?sub:id_t->?only:int->f:(Match.t->string)->t->string->stringOr_error.tvalreplace_exn:?sub:id_t->?only:int->f:(Match.t->string)->t->string->stringmoduleExceptions:sig(** [Regex_no_such_subpattern (n, max)] means [n] was requested but only [max]
subpatterns are defined (so [max] - 1 is the highest valid index) *)exceptionRegex_no_such_subpatternofint*int(** [Regex_no_such_named_subpattern (name, pattern)] *)exceptionRegex_no_such_named_subpatternofstring*string(** [Regex_match_failed pattern] *)exceptionRegex_match_failedofstring(** [Regex_submatch_did_not_capture (s, i)] means the [i]th subpattern in the
regex compiled from [s] did not capture a substring. *)exceptionRegex_submatch_did_not_captureofstring*int(** the string is the C library's error message, generally in the form of
"(human-readable error): (piece of pattern that did not compile)" *)exceptionRegex_compile_failedofstring(** [Regex_rewrite_template_invalid (template, error_msg)] *)exceptionRegex_rewrite_template_invalidofstring*stringendmoduleMultiple:sig(** An efficient way to ask which of several regexes matches a string. *)type'at(** [create ?options [ (pattern1, value1); (pattern2, value2); ...]] associates each
[pattern] with its [value]. The same [options] are used for all patterns. *)valcreate:?options:Options.t->(string*'a)list->'atOr_error.tvalcreate_exn:?options:Options.t->(string*'a)list->'at(** [matches t input] returns the values associated with those patterns that match the
[input]. Values are in the order that [create] saw them. *)valmatches:'at->string->'alist(** Like [matches], but values are listed in unspecified order. *)valmatches_no_order:'at->string->'alistendmoduleStable:sig(** [V2] serializes, compares and hashes the pattern and the all currently known
options, except max_mem. If Re2 gained a new option, we would have to mint a V3.t
as the V2 format would not support this option. How we support such an upgrade
(raise, drop the option, smash the tree) will have to considered then. *)moduleV2:sigtypenonrect=t[@@derivinghash]includeStable_comparable.V1withtypet:=tandtypecomparator_witness=comparator_witnessend(** [V1_no_options] is the legacy serialization: pattern only, options are lost. *)moduleV1_no_options:sigtypenonrect=t[@@derivinghash]includeStable_without_comparatorwithtypet:=tendendend