Fehu_algorithms.ReinforceSourceReinforce algorithm implementation.
REINFORCE (Monte Carlo Policy Gradient) is a classic policy gradient method that collects complete episodes and updates the policy using Monte Carlo return estimates. See Reinforce for detailed documentation.
Monte Carlo policy gradient (REINFORCE) training API.
type config = {learning_rate : float;gamma : float;use_baseline : bool;reward_scale : float;entropy_coef : float;max_episode_steps : int;}type metrics = {episode_return : float;episode_length : int;episode_won : bool;stage_desc : string;avg_entropy : float;avg_log_prob : float;adv_mean : float;adv_std : float;value_loss : float option;total_steps : int;total_episodes : int;}val init :
?baseline_network:Kaun.module_ ->
env:
((float, Bigarray.float32_elt) Rune.t,
(int32, Bigarray.int32_elt) Rune.t,
'render)
Fehu.Env.t ->
policy_network:Kaun.module_ ->
rng:Rune.Rng.key ->
config:config ->
unit ->
params * stateval step :
env:
((float, Bigarray.float32_elt) Rune.t,
(int32, Bigarray.int32_elt) Rune.t,
'render)
Fehu.Env.t ->
params:params ->
state:state ->
params * stateval train :
?baseline_network:Kaun.module_ ->
env:
((float, Bigarray.float32_elt) Rune.t,
(int32, Bigarray.int32_elt) Rune.t,
'render)
Fehu.Env.t ->
policy_network:Kaun.module_ ->
rng:Rune.Rng.key ->
config:config ->
total_timesteps:int ->
?callback:(metrics -> [ `Continue | `Stop ]) ->
unit ->
params * stateval load :
path:string ->
env:
((float, Bigarray.float32_elt) Rune.t,
(int32, Bigarray.int32_elt) Rune.t,
'render)
Fehu.Env.t ->
policy_network:Kaun.module_ ->
?baseline_network:Kaun.module_ ->
config:config ->
unit ->
(params * state, string) result