Glyph.StringSourcemeasure ~width_method ~tab_width s is the total display width of s. Control characters contribute 0.
Note. Invalid UTF-8 byte sequences are replaced with U+FFFD, each contributing width 1.
See also measure_sub.
val measure_sub :
width_method:width_method ->
tab_width:int ->
string ->
pos:int ->
len:int ->
intmeasure_sub ~width_method ~tab_width s ~pos ~len is like measure but operates on the substring s.[pos] .. s.[pos + len - 1] without allocating. The result is 0 when len <= 0.
grapheme_count s is the number of user-perceived characters (grapheme clusters) in s. Uses full UAX #29 segmentation.
iter_graphemes f s calls f ~offset ~len for each grapheme cluster in s.
ignore_zwj defaults to false. When true, ZWJ does not join emoji sequences (same boundary behaviour as `No_zwj).
Note. Invalid UTF-8 byte sequences are treated as individual replacement characters (U+FFFD).
See also iter_grapheme_info.
val iter_grapheme_info :
width_method:width_method ->
tab_width:int ->
(offset:int -> len:int -> width:int -> unit) ->
string ->
unititer_grapheme_info ~width_method ~tab_width f s calls f ~offset ~len ~width for each grapheme cluster in s. Uses the same width calculation and ZWJ handling as Pool.encode. Graphemes whose width resolves to 0 (control and zero-width sequences) are skipped.
Note. Invalid UTF-8 byte sequences are treated as individual replacement characters (U+FFFD).
See also iter_graphemes.
val iter_wrap_breaks :
?width_method:width_method ->
(break_byte_offset:int ->
next_byte_offset:int ->
grapheme_offset:int ->
unit) ->
string ->
unititer_wrap_breaks f s calls f ~break_byte_offset ~next_byte_offset ~grapheme_offset for each word-wrap break opportunity in s, in order from start to end, with:
break_byte_offset — zero-based byte position of the grapheme containing the wrap-break character.next_byte_offset — zero-based byte position of the next grapheme after the break (the resume position).grapheme_offset — zero-based grapheme index of the grapheme containing the wrap-break character.Breaks occur after graphemes containing ASCII space, tab, hyphen, path separators, punctuation, brackets, and Unicode NBSP, ZWSP, soft hyphen, and typographic spaces.
width_method controls grapheme boundary detection: `Unicode (the default) treats ZWJ sequences as single graphemes, `No_zwj breaks them apart.
See also iter_line_breaks.
iter_line_breaks f s calls f ~pos ~kind for each line terminator in s, in order from start to end, with:
pos — zero-based byte position. For `CRLF this is the position of the LF byte; for `LF and `CR, the respective byte.kind — the line_break_kind.CRLF sequences are reported once as `CRLF, not as separate `CR and `LF breaks.
See also iter_wrap_breaks.