Kaun.MetricsSourcePerformance metrics for neural network training and evaluation.
This module provides a comprehensive set of metrics for monitoring model performance during training and evaluation. Metrics are designed to be composable, efficient, and stateful for accumulation across batches while remaining layout-agnostic at the type level.
Layout-independent metric accumulator that produces host float values when computed.
accuracy ?threshold ?top_k () creates an accuracy metric.
Example
let acc = Metrics.accuracy () in
let top5_acc = Metrics.accuracy ~top_k:5 ()precision ?threshold ?zero_division () creates a precision metric.
Precision = True Positives / (True Positives + False Positives)
Example
let prec = Metrics.precision ()recall ?threshold ?zero_division () creates a recall metric.
Recall = True Positives / (True Positives + False Negatives)
f1_score ?threshold ?beta () creates an F-score metric.
F-score = (1 + β²) * (Precision * Recall) / (β² * Precision + Recall)
auc_roc () creates an AUC-ROC (Area Under the Receiver Operating Characteristic) metric that integrates true/false positive rates observed across batches.
auc_pr () creates an AUC-PR (Area Under the Precision–Recall) metric. Computes the exact precision–recall integral by sorting predictions and accumulating precision/recall scores across all seen batches.
val confusion_matrix :
num_classes:int ->
?normalize:[ `None | `True | `Pred | `All ] ->
unit ->
metricconfusion_matrix ~num_classes ?normalize () accumulates a confusion matrix for classification tasks.
mse ?reduction () creates a Mean Squared Error metric.
MSE = mean((predictions - targets)²)
rmse ?reduction () creates a Root Mean Squared Error metric.
RMSE = sqrt(mean((predictions - targets)²))
mae ?reduction () creates a Mean Absolute Error metric.
MAE = mean(|predictions - targets|)
loss () tracks the running mean of loss values. Pass batch losses through update ~loss to accumulate them.
mape ?eps () creates a Mean Absolute Percentage Error metric.
MAPE = mean(|predictions - targets| / (|targets| + eps)) * 100
r2_score ?adjusted ?num_features () creates an R² coefficient of determination metric.
R² = 1 - (SS_res / SS_tot)
explained_variance () creates an explained variance metric.
EV = 1 - Var(targets - predictions) / Var(targets)
cross_entropy ?from_logits () creates a cross-entropy metric.
binary_cross_entropy ?from_logits () creates a binary cross-entropy metric.
kl_divergence ?eps () creates a Kullback–Leibler divergence metric.
KL(P||Q) = ÎŁ P log(P / Q)
perplexity ?base () creates a perplexity metric for language models.
Perplexity = base^(cross_entropy)
ndcg ?k () creates a Normalised Discounted Cumulative Gain metric.
map ?k () creates a Mean Average Precision metric for ranking.
mrr ?k () creates a Mean Reciprocal Rank metric.
MRR = mean(1 / rank_of_first_relevant_item)
bleu ?max_n ?weights ?smoothing () creates a BLEU score metric for pre-tokenized integer sequences.
Predictions and targets must be shaped batch, seq_len with integer token identifiers. Zero values are treated as padding and ignored.
rouge ~variant ?use_stemmer () creates a ROUGE score metric for pre-tokenized integer sequences.
Predictions and targets must be shaped batch, seq_len with integer token identifiers. Zero values are treated as padding and ignored.
meteor ?alpha ?beta ?gamma () creates a METEOR score metric for pre-tokenized integer sequences.
Predictions and targets must be shaped batch, seq_len with integer token identifiers. Zero values are treated as padding and ignored.
psnr ?max_val () creates a Peak Signal-to-Noise Ratio metric.
PSNR = 10 * log10(max_val² / MSE)
ssim ?window_size ?k1 ?k2 () creates a Structural Similarity Index metric.
The implementation evaluates the global SSIM across the full prediction and target tensors using scalar statistics derived from window_size, k1, and k2.
iou ?threshold ?per_class ~num_classes () creates an Intersection over Union metric.
Inputs must contain integer class indices in 0, num_classes). When [num_classes = 2], [threshold] binarises predictions before computing IoU. When [per_class = true], the metric reports one IoU per class; otherwise it returns the mean over classes with non-zero support.
dice ?threshold ?per_class ~num_classes () creates a Sørenson Dice coefficient metric with the same input conventions as iou.
val update :
metric ->
predictions:(float, 'layout) Rune.t ->
targets:(_, 'layout) Rune.t ->
?loss:(float, 'layout) Rune.t ->
?weights:(float, 'layout) Rune.t ->
unit ->
unitupdate metric ~predictions ~targets ?loss ?weights () updates the metric state. All tensors must share the same (hidden) layout. When supplied, the loss tensor is treated as an auxiliary scalar for metrics that track losses.
compute metric returns the aggregated metric value as a host float.
compute_tensor metric returns the aggregated metric value as a device tensor.
clone metric creates a new metric with the same configuration but fresh state.
val create_custom :
dtype:(float, 'layout) Rune.dtype ->
name:string ->
init:(unit -> (float, 'layout) Rune.t list) ->
update:
((float, 'layout) Rune.t list ->
predictions:(float, 'layout) Rune.t ->
targets:(float, 'layout) Rune.t ->
?weights:(float, 'layout) Rune.t ->
unit ->
(float, 'layout) Rune.t list) ->
compute:((float, 'layout) Rune.t list -> (float, 'layout) Rune.t) ->
reset:((float, 'layout) Rune.t list -> (float, 'layout) Rune.t list) ->
metriccreate_custom ~dtype ~name ~init ~update ~compute ~reset constructs a custom metric from user-provided accumulator functions.
is_better metric ~higher_better ~old_val ~new_val determines whether the new metric value improves upon the previous one.
format metric value pretty-prints a metric value for logging.