Fehu.PolicySourceLightweight helpers for constructing policies (random, deterministic, or greedy discrete).
Helper combinators for building policies.
Policy returning an action with optional log prob and value estimate.
Wrap a deterministic action function as a policy.
Epsilon-free stochastic policy that samples uniformly from the action space. Reuses the environment RNG when rng is omitted.
val greedy_discrete :
('obs, Space.Discrete.element, 'render) Env.t ->
score:('obs -> float array) ->
('obs, Space.Discrete.element) tBuild a greedy policy for discrete action spaces.
The score function must return per-action scores (e.g., Q-values). The policy selects the highest scoring action, respecting the space's offset.