Bert.TokenizerSourceBERT tokenizer instance
Create a WordPiece tokenizer for BERT. Either provide a vocab_file path or a model_id to download from HuggingFace (defaults to bert-base-uncased)
Encode text to token IDs with CLS and SEP tokens
Encode text directly to input tensors ready for forward pass
val encode_batch :
t ->
?max_length:int ->
?padding:bool ->
string list ->
(int32, Rune.int32_elt) Rune.tEncode multiple texts with padding and special tokens