Ngram (u.95651758f926ec81d627efae6ee3c604.saga.1.0.0~alpha1.doc.saga.models.Saga

Types

Sourcetype t

An n-gram model

Sourcetype vocab_stats = {

}

Statistics about the trained model

Sourcetype smoothing =

Smoothing strategies:

val create : 
  n:int ->
  ?smoothing:smoothing ->
  ?cache_capacity:int ->
  int array ->
  t

create ~n ?smoothing ?cache_capacity tokens builds a model with configurable smoothing and an optional logits cache.

Sourceval logits : t -> context:int array -> float array

logits model ~context returns log probabilities given context. Context length should be n-1 for an n-gram model.

Sourceval perplexity : t -> int array -> float

perplexity model tokens computes perplexity on test tokens

Sourceval log_prob : t -> int array -> float

log_prob model tokens returns the sum of log-probabilities of the observed tokens under the model.

val generate : 
  t ->
  ?max_tokens:int ->
  ?temperature:float ->
  ?seed:int ->
  ?start:int array ->
  unit ->
  int array

generate model ?max_tokens ?temperature ?seed ?start () generates tokens

Sourceval stats : t -> vocab_stats

stats model returns statistics about the highest-order n-grams.

Sourceval save : t -> string -> unit

save model filename serializes the model to a file.

Sourceval load : string -> t

load filename deserializes the model from a file.

Sourceval save_text : t -> string -> unit

save_text model filename serializes the model to a text file.

Sourceval load_text : string -> t

load_text filename deserializes the model from a text file.

Sourceval n : t -> int

n model returns the n-gram order of the model.