Saga_tokenizers.EncodingSourceEncoding module - represents the output of a tokenizer
The main encoding type - abstract to users
val create :
ids:int array ->
type_ids:int array ->
tokens:string array ->
words:int option array ->
offsets:(int * int) array ->
special_tokens_mask:int array ->
attention_mask:int array ->
overflowing:t list ->
sequence_ranges:(int, int * int) Hashtbl.t ->
tCreate a new encoding - for internal use
Create encoding from tokens
Get the sequence index containing the given token
Get the character offsets of the given token
Get the tokens corresponding to the given word
Get the character offsets of the given word
Get the token containing the given character position
Get the word containing the given character position
Truncation direction
Truncate the encoding
Padding direction
val pad :
t ->
target_length:int ->
pad_id:int ->
pad_type_id:int ->
pad_token:string ->
direction:padding_direction ->
tPad the encoding