Module Fehu.PolicySource

Lightweight helpers for constructing policies (random, deterministic, or greedy discrete).

Helper combinators for building policies.

Sourcetype ('obs, 'act) t = 'obs -> 'act * float option * float option

Policy returning an action with optional log prob and value estimate.

Sourceval deterministic : ('obs -> 'act) -> ('obs, 'act) t

Wrap a deterministic action function as a policy.

Sourceval random : ?rng:Rune.Rng.key -> ('obs, 'act, 'render) Env.t -> ('obs, 'act) t

Epsilon-free stochastic policy that samples uniformly from the action space. Reuses the environment RNG when rng is omitted.

Sourceval greedy_discrete : ('obs, Space.Discrete.element, 'render) Env.t -> score:('obs -> float array) -> ('obs, Space.Discrete.element) t

Build a greedy policy for discrete action spaces.

The score function must return per-action scores (e.g., Q-values). The policy selects the highest scoring action, respecting the space's offset.