String (p.matrix.0.1.0.doc.matrix.glyph.Glyph.String)

Measuring

Sourceval measure : width_method:width_method -> tab_width:int -> string -> int

measure ~width_method ~tab_width s is the total display width of s. Control characters contribute 0.

Note. Invalid UTF-8 byte sequences are replaced with U+FFFD, each contributing width 1.

Counting

Sourceval grapheme_count : string -> int

grapheme_count s is the number of user-perceived characters (grapheme clusters) in s. Uses full UAX #29 segmentation.

Iterating

Source

val iter_graphemes : 
  ?ignore_zwj:bool ->
  (offset:int -> len:int -> unit) ->
  string ->
  unit

iter_graphemes f s calls f ~offset ~len for each grapheme cluster in s.

ignore_zwj defaults to false. When true, ZWJ does not join emoji sequences (same boundary behaviour as `No_zwj).

Note. Invalid UTF-8 byte sequences are treated as individual replacement characters (U+FFFD).

See also iter_graphemes.

Source

val iter_wrap_breaks : 
  ?width_method:width_method ->
  (break_byte_offset:int ->
    next_byte_offset:int ->
    grapheme_offset:int ->
    unit) ->
  string ->
  unit

iter_wrap_breaks f s calls f ~break_byte_offset ~next_byte_offset ~grapheme_offset for each word-wrap break opportunity in s, in order from start to end, with:

break_byte_offset — zero-based byte position of the grapheme containing the wrap-break character.
next_byte_offset — zero-based byte position of the next grapheme after the break (the resume position).
grapheme_offset — zero-based grapheme index of the grapheme containing the wrap-break character.

Breaks occur after graphemes containing ASCII space, tab, hyphen, path separators, punctuation, brackets, and Unicode NBSP, ZWSP, soft hyphen, and typographic spaces.

width_method controls grapheme boundary detection: `Unicode (the default) treats ZWJ sequences as single graphemes, `No_zwj breaks them apart.