Module BatScanf

Formatted input functions.

Introduction
Functional input with format strings

The module Scanf provides formatted input functions or scanners.

The formatted input functions can read from any kind of input, including strings, files, or anything that can return characters. The more general source of characters is named a scanning buffer and has type Scanning.scanbuf. The more general formatted input function reads from any scanning buffer and is named bscanf.

Generally speaking, the formatted input functions have 3 arguments:

Hence, a typical call to the formatted input function Scanf.bscanf is bscanf ib fmt f, where:

A simple example

As suggested above, the expression bscanf ib "%d" f reads a decimal integer n from the source of characters ib and returns f n.

For instance,

then bscanf stdib "%d" f reads an integer n from the standard input and returns f n (that is n + 1). Thus, if we evaluate bscanf stdib "%d" f, and then enter 41 at the keyboard, we get 42 as the final result.

Formatted input as a functional feature

The OCaml scanning facility is reminiscent of the corresponding C feature. However, it is also largely different, simpler, and yet more powerful: the formatted input functions are higher-order functionals and the parameter passing mechanism is just the regular function application not the variable assignment based mechanism which is typical for formatted input in imperative languages; the OCaml format strings also feature useful additions to easily define complex tokens; as expected within a functional programming language, the formatted input functions also support polymorphism, in particular arbitrary interaction with polymorphic user-defined scanners. Furthermore, the OCaml formatted input facility is fully type-checked at compile time.

module Scanning : sig ... end
Type of formatted input functions
type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.scanbuf, 'b, 'c, 'a -> 'd, 'd) format6 -> 'c
exception Scan_failure of string
The general formatted input function
val bscanf : Scanning.scanbuf -> ('a, 'b, 'c, 'd) scanner
Format string description

The format is a character string which contains three types of objects:

The space character in format strings

As mentioned above, a plain character in the format string is just matched with the characters of the input; however, one character is a special exception to this simple rule: the space character (ASCII code 32) does not match a single space character, but any amount of ``whitespace'' in the input. More precisely, a space inside the format string matches any number of tab, space, line feed and carriage return characters.

Matching any amount of whitespace, a space in the format string also matches no amount of whitespace at all; hence, the call bscanf ib "Price = %d $" (fun p -> p) succeeds and returns 1 when reading an input with various whitespace in it, such as Price = 1 $, Price = 1 $, or even Price=1$.

Conversion specifications in format strings

Conversion specifications consist in the % character, followed by an optional flag, an optional field width, and followed by one or two conversion characters. The conversion characters and their meanings are:

Following the % character that introduces a conversion, there may be the special flag _: the conversion that follows occurs as usual, but the resulting value is discarded. For instance, if f is the function fun i -> i + 1, then Scanf.sscanf "x = 1" "%_s = %i" f returns 2.

The field width is composed of an optional integer literal indicating the maximal width of the token to read. For instance, %6d reads an integer, having at most 6 decimal digits; %4f reads a float with at most 4 characters; and %8[\\000-\\255] returns the next 8 characters (or all the characters still available, if fewer than 8 characters are available in the input).

Notes:

Scanning indications in format strings

Scanning indications appear just after the string conversions %s and %[ range ] to delimit the end of the token. A scanning indication is introduced by a @ character, followed by some constant character c. It means that the string token should end just before the next matching c (which is skipped). If no c character is encountered, the string token spreads as much as possible. For instance, "%s@\t" reads a string up to the next tab character or to the end of input. If a scanning indication \@c does not follow a string conversion, it is treated as a plain c character.

Note:

Exceptions during scanning

Scanners may raise the following exceptions when the input cannot be read according to the format string:

Note:

Specialized formatted input functions
val fscanf : in_channel -> ('a, 'b, 'c, 'd) scanner
val sscanf : string -> ('a, 'b, 'c, 'd) scanner
val scanf : ('a, 'b, 'c, 'd) scanner
val kscanf : Scanning.scanbuf -> (Scanning.scanbuf -> exn -> 'd) -> ('a, 'b, 'c, 'd) scanner
Reading format strings from input
val bscanf_format : Scanning.scanbuf -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> ('a, 'b, 'c, 'd, 'e, 'f) format6