Fehu_algorithms.DqnSourceDqn algorithm implementation.
DQN (Deep Q-Network) is an off-policy value-based method that uses experience replay and target networks for stable training. It learns Q-values for discrete actions and selects actions greedily. See Dqn for detailed documentation.
Deep Q-Network (DQN) training API.
type config = {learning_rate : float;gamma : float;epsilon_start : float;epsilon_end : float;epsilon_decay : float;batch_size : int;buffer_capacity : int;target_update_freq : int;warmup_steps : int;}type metrics = {loss : float;avg_q_value : float;epsilon : float;episode_return : float option;episode_length : int option;total_steps : int;total_episodes : int;}val init :
env:
((float, Bigarray.float32_elt) Rune.t,
(int32, Bigarray.int32_elt) Rune.t,
'render)
Fehu.Env.t ->
q_network:Kaun.module_ ->
rng:Rune.Rng.key ->
config:config ->
params * stateval step :
env:
((float, Bigarray.float32_elt) Rune.t,
(int32, Bigarray.int32_elt) Rune.t,
'render)
Fehu.Env.t ->
params:params ->
state:state ->
params * stateval train :
env:
((float, Bigarray.float32_elt) Rune.t,
(int32, Bigarray.int32_elt) Rune.t,
'render)
Fehu.Env.t ->
q_network:Kaun.module_ ->
rng:Rune.Rng.key ->
config:config ->
total_timesteps:int ->
?callback:(metrics -> [ `Continue | `Stop ]) ->
unit ->
params * stateval load :
path:string ->
env:
((float, Bigarray.float32_elt) Rune.t,
(int32, Bigarray.int32_elt) Rune.t,
'render)
Fehu.Env.t ->
q_network:Kaun.module_ ->
config:config ->
(params * state, string) result