saga.tokenizers
Saga_tokenizers.Decoders
Decoding module for converting token IDs back to text.
saga
saga.models
type t
Main decoder type
val bpe : ?suffix:string -> unit -> t
Create a BPE decoder.
Suffix to remove (default: "")
val byte_level : unit -> t
Create a byte-level decoder
val byte_fallback : unit -> t
Create a byte fallback decoder
val wordpiece : ?prefix:string -> ?cleanup:bool -> unit -> t
Create a WordPiece decoder.
Prefix to remove (default: "##")
Whether to cleanup tokenization artifacts (default: true)
val metaspace : ?replacement:char -> ?add_prefix_space:bool -> unit -> t
Create a Metaspace decoder.
Character to replace spaces with (default: '▁')
Whether prefix space was added (default: true)
val ctc : ?pad_token:string -> ?word_delimiter_token:string -> ?cleanup:bool -> unit -> t
Create a CTC decoder.
Padding token (default: "<pad>")
Word delimiter token (default: "|")
Whether to cleanup artifacts (default: true)
val sequence : t list -> t
Combine multiple decoders in sequence
val replace : pattern:string -> content:string -> unit -> t
Create a replace decoder.
Pattern to match
Replacement string
val strip : ?left:bool -> ?right:bool -> ?content:char -> unit -> t
Create a strip decoder.
Strip from left (default: false)
Strip from right (default: false)
Character to strip (default: ' ')
val fuse : unit -> t
Create a fuse decoder that merges tokens
val decode : t -> string list -> string
Decode a list of tokens back to text
val to_json : t -> Yojson.Basic.t
val of_json : Yojson.Basic.t -> t