Fehu.EnvSourceCore environment interface for reinforcement learning.
Defines the standard RL environment API: reset, step, render, and close. All environments implement this interface. See Env for lifecycle management and custom environment creation.
Core environment interface for reinforcement learning.
This module defines the standard RL environment interface inspired by OpenAI Gymnasium. Environments represent interactive tasks where agents observe states, take actions, and receive rewards.
Create an environment, reset it to get an initial observation, interact by stepping with actions, and optionally render or close resources:
let env =
Env.create ~rng ~observation_space ~action_space ~reset ~step ()
in
let obs, info = Env.reset env () in
let transition = Env.step env action in
Env.close envEpisodes end in two ways:
This distinction matters for bootstrapping: terminated episodes have zero future value, while truncated episodes may continue beyond the limit.
Implement custom environments by providing reset and step functions:
let env =
Env.create ~rng ~observation_space ~action_space
~reset:(fun env ?options () ->
(* reset logic *)
(initial_obs, info))
~step:(fun env action ->
(* transition logic *)
{ observation; reward; terminated; truncated; info })
()type ('obs, 'act, 'render) transition = {observation : 'obs;Observation after taking the action
*)reward : float;Immediate reward from the action
*)terminated : bool;Whether episode ended naturally
*)truncated : bool;Whether episode was artificially cut off
*)info : Info.t;Auxiliary diagnostic information
*)}Transition resulting from taking an action in an environment.
Returned by step. Contains the next observation, reward, episode status flags, and optional metadata.
val transition :
?reward:float ->
?terminated:bool ->
?truncated:bool ->
?info:Info.t ->
observation:'obs ->
unit ->
('obs, 'act, 'render) transitiontransition ~reward ~terminated ~truncated ~info ~observation () constructs a transition.
Defaults: reward = 0.0, terminated = false, truncated = false, info = Info.empty.
Environment handle.
Encapsulates environment state, observation/action spaces, and RNG. The type parameters represent:
'obs: Observation type'act: Action type'render: Rendering output typeval create :
?id:string ->
?metadata:Metadata.t ->
rng:Rune.Rng.key ->
observation_space:'obs Space.t ->
action_space:'act Space.t ->
reset:(('obs, 'act, 'render) t -> ?options:Info.t -> unit -> 'obs * Info.t) ->
step:(('obs, 'act, 'render) t -> 'act -> ('obs, 'act, 'render) transition) ->
?render:(('obs, 'act, 'render) t -> 'render option) ->
?close:(('obs, 'act, 'render) t -> unit) ->
unit ->
('obs, 'act, 'render) tcreate ~id ~metadata ~rng ~observation_space ~action_space ~reset ~step ~render ~close () constructs a new environment.
Parameters:
id: Optional identifier for the environmentmetadata: Environment metadata (default: Metadata.default)rng: Random number generator key for reproducibilityobservation_space: Space defining valid observationsaction_space: Space defining valid actionsreset: Function to reset environment to initial state. Receives optional reset options and returns initial observation and infostep: Function to advance environment by one timestep. Receives an action and returns a transitionrender: Optional function to render environment stateclose: Optional cleanup function to release resourcesid env returns the environment's identifier, if any.
metadata env returns the environment's metadata.
set_metadata env metadata updates the environment's metadata.
rng env returns the current RNG key without consuming it.
set_rng env rng replaces the environment's RNG key.
take_rng env returns the current RNG key and generates a fresh one.
Splits the RNG internally, returning one half and keeping the other for future use. Use this to obtain independent random streams.
split_rng env ~n generates n independent RNG keys.
Splits the environment's RNG into n+1 keys: n returned in the array and one kept for the environment. Use this for parallel operations requiring independent randomness.
observation_space env returns the space of valid observations.
action_space env returns the space of valid actions.
reset env ~options () resets the environment to an initial state.
Returns (initial_observation, info) where info contains optional diagnostic data. Call this at the start of training and after episodes complete.
The options parameter allows passing environment-specific reset configuration.
step env action executes action in the environment.
Returns a transition containing the next observation, reward, termination flags, and info. The action must be valid according to action_space.
After an episode terminates or truncates, call reset before stepping again.
render env produces a visualization of the current environment state.
Returns None if rendering is not supported or unavailable.
close env releases resources held by the environment.
Call this when done using the environment. Subsequent operations on a closed environment may fail.