Module UnstrctrdSource

Unstrctrd.

Unstrctrd (Unstructured) is a lexer/parser according RFC822. It accepts any input which respects ABNF described by RFC5322 (including obsolete form). To contextualize the purpose, email header, a part of DEB format, or HTTP 1.1 header respect, at least, a form, the unstructured form which allows to split a value with a folding-whitespace token.

This token permits to limit any values to 80 characters per line:

To: Romain Calascibetta\r\n
   <romain@calascibetta.org>

Then, others forms like email address or subject should, at least, be a subset of this form. The goal of this library is to delay complexity of this form to a little and basic library.

Unstrctrd handles UTF-8 as well (RFC6532). Any input should always terminate by CRLF.

Sourcetype elt = [
  1. | `Uchar of Uchar.t
  2. | `WSP of wsp
  3. | `LF
  4. | `CR
  5. | `FWS of wsp
  6. | `d0
  7. | `OBS_NO_WS_CTL of obs
]
Sourceand wsp = private string
Sourceand obs = private char
Sourcetype t = private elt list
Sourcetype error = [
  1. | `Msg of string
]
Sourceval empty : t
Sourceval length : t -> int
Sourceval of_string : string -> (int * t, [> error ]) result

of_string raw tries to parse raw and extract the unstructured form. raw should, at least, terminate by CRLF.

Sourceval of_list : elt list -> (t, [> error ]) result

of_list lst tries to coerce lst to t. It verifies that lst can not produce CRLF terminating token (eg. [`CR; `LF]).

Sourceval to_utf_8_string : t -> string

to_utf_8_string t returns a valid UTF-8 string of t.

Sourceval iter : f:(elt -> unit) -> t -> unit
Sourceval fold : f:('a -> elt -> 'a) -> 'a -> t -> 'a
Sourceval map : f:(elt -> elt) -> t -> t
Sourceval wsp : len:int -> elt
Sourceval tab : len:int -> elt
Sourceval fws : ?tab:bool -> int -> elt
Sourceval without_comments : t -> (t, [> error ]) result

without_comments t tries to delete any comment of t. A comment is a part which begins with '(' and ends with ')'. If we find a non-associated parenthesis, we return an error.

Sourceval fold_fws : t -> t
Sourceval split_at : index:int -> t -> t * t
Sourceval split_on : on:[ `WSP | `FWS | `Uchar of Uchar.t | `Char of char | `LF | `CR ] -> t -> (t * t) option

split_on ~on t is either the pair (t0, t1) of the two (possibly empty) subparts of t that are delimited by the first match of on or None if on can't be matched in t.

The invariant t0 ^ sep ^ t1 = t holds.

/ *

Sourcemodule type MONAD = sig ... end
Sourcemodule type BUFFER = sig ... end
Sourcemodule Make (Buffer : BUFFER) (Monad : MONAD with type buffer = Buffer.t) : sig ... end
Sourceval lexbuf_make : unit -> Lexing.lexbuf
Sourceval post_process : (t -> 'a) -> [ `FWS of string | `OBS_UTEXT of int * int * string | `VCHAR of string | `WSP of string ] list -> 'a