Module Cure2Source

Cure2: module to create Re2 regexps using combinators.

Sourcetype charset
Sourcetype t
Sourceval to_re2 : t -> Re2.t
Sourceval to_string : t -> string

to_string re is the string associated with re. It can be useful for debugging or sending the regexp over the network.

Sourceval regex : string -> t

regex str is the regex represented by str according to Re2's syntax. Hopefully there should be no need for this, but it could be useful if something is missing No validation is performed on the string, so its possible to get an error from Re2 if there is a syntax error. Beware that using this to set flags can modify the meaning of other combinators, like any which is meant to match new lines.

Constants

Sourceval str : string -> t

str s matches s.

Sourceval char : char -> t

char c matches the string with one char which is c.

Basic operations on regular expressions

Sourceval alt : t list -> t

Alternative.

The leftmost match is preferred.

Infix operator (||) is available.

Sourceval seq : t list -> t

Sequence.

Infix operator (+) is available.

Sourceval rep : ?min:int -> ?max:int -> t -> t

rep ~min ~max re matches re at least min times and at most max times, bounds included. min defaults to 0 and max to infinity.

Unary operator (!*) is equivalent.

Sourceval rep1 : t -> t

1 or more matches.

Unary operator (!+) is equivalent.

Sourceval opt : t -> t

0 or 1 matches.

Unary operator (!?) is equivalent.

Operators

Sourceval (!?) : t -> t

!?re is opt re

Sourceval (!*) : t -> t

!?re is rep re

Sourceval (!+) : t -> t

!?re is rep1 re

Sourceval (||) : t -> t -> t

(||) x y is alt [x; y]

Sourceval (+) : t -> t -> t

(+) x y is seq [x; y]

String, line, word

Sourceval start : t

Initial position

Sourceval stop : t

Final position

Sourceval bol : t

Beginning of line, compiled to "^".

Sourceval eol : t

End of line, compiled to "$".

Sourceval bow : t

Boundary of ascii word

Sourceval not_bow : t

Not at a boundary of ascii word

Sourceval whole_string : t -> t

Only matches the whole string, i.e. fun t -> seq [ bos; t; eos ].

Semantics

Sourceval greedy : t -> t

makes rep match the most

Sourceval non_greedy : t -> t
Sourceval case : t -> t

Case sensitive matching. On by default

Sourceval no_case : t -> t

Case insensitive matching. Off by default/

Sourceval group : ?name:string -> t -> t

Charsets

Sourceval any : t

any character, including newline. To exclude newline, use notnl

Sourceval notnl : t

any character except newline.

Sourceval alnum : t

ascii alphanumeric ([0-9A-Za-z])

Sourceval alpha : t

ascii alphabetic ([A-Za-z])

Sourceval ascii : t

ASCII ([\x00-\x7F])

Sourceval blank : t

ascii blank ([\t ])

Sourceval cntrl : t

ascii control ([\x00-\x1F\x7F])

Sourceval digit : t

ascii digits ([0-9])

Sourceval graph : t

ascii graphical ([A-Za-z0-9!""#$%&'()*+,\-./:;<=>?@[\\]^_`{}|~]])

Sourceval lower : t

ascii lower case ([a-z])

Sourceval print : t

ascii printable ([ [:graph:]])

Sourceval punct : t

ascii punctuation ([!-/:-@[-`{-~]])

Sourceval space : t

ascii whitespace ([\t\n\v\f\r ])

Sourceval upper : t

ascii upper case ([A-Z])

Sourceval word : t

ascii word characters ([0-9A-Za-z_])

Sourceval xdigit : t

ascii hex digit ([0-9A-Fa-f])

Sourceval not_alnum : t

not ascii alphanumeric ([^0-9A-Za-z])

Sourceval not_alpha : t

not ascii alphabetic ([^A-Za-z])

Sourceval not_ascii : t

not ascii ASCII ([^\x00-\x7F])

Sourceval not_blank : t

not ascii blank ([^\t ])

Sourceval not_cntrl : t

not ascii control ([^\x00-\x1F\x7F])

Sourceval not_digit : t

not ascii digits ([^0-9])

Sourceval not_graph : t

not ascii graphical ([^A-Za-z0-9!""#$%&'()*+,\-./:;<=>?@[\\]^_`{}|~]])

Sourceval not_lower : t

not ascii lower case ( [a-z] )

Sourceval not_print : t

not ascii printable ([^ [:graph:]])

Sourceval not_punct : t

not ascii punctuation ([^!-/:-@[-`{-~]])

Sourceval not_space : t

not ascii whitespace ([^\t\n\v\f\r ])

Sourceval not_upper : t

not ascii upper case ([^A-Z])

Sourceval not_word : t

not ascii word characters ([^0-9A-Za-z_])

Sourceval not_xdigit : t

not ascii hex digit ([^0-9A-Fa-f])

Sourceval chars : string -> t

any character of the string

Sourceval charset : ?neg:bool -> charset list -> t

charset cs matches any character that is part of cs charset ~neg:true cs matches any character that is not part of cs

Sourcemodule Charset : sig ... end