Processors (u.95651758f926ec81d627efae6ee3c604.saga.1.0.0~alpha1.doc.saga.tokenizers.Saga

Sourcetype encoding = {

}

Type representing an encoding to be processed

Sourcetype t

Main post-processor type

Constructors

Sourceval bert : sep:(string * int) -> cls:(string * int) -> unit -> t

Create a BERT post-processor.

val roberta : 
  sep:(string * int) ->
  cls:(string * int) ->
  ?trim_offsets:bool ->
  ?add_prefix_space:bool ->
  unit ->
  t

Create a RoBERTa post-processor.

Sourceval byte_level : ?trim_offsets:bool -> unit -> t

Create a byte-level post-processor.

val template : 
  single:string ->
  ?pair:string ->
  ?special_tokens:(string * int) list ->
  unit ->
  t

Create a template post-processor.

Sourceval sequence : t list -> t

Combine multiple post-processors in sequence

Sourceval process : t -> encoding list -> add_special_tokens:bool -> encoding list

Process encodings with the post-processor.

Sourceval added_tokens : t -> is_pair:bool -> int

Get the number of tokens added by this post-processor.

Sourceval to_json : t -> Yojson.Basic.t

Convert post-processor to JSON representation

Sourceval of_json : Yojson.Basic.t -> t

Create post-processor from JSON representation