Skip to content

DQN is not converging even after 15M timesteps #214

Closed
@MilanVZinzuvadiya

Description

@MilanVZinzuvadiya

Question

I am training Pong-v4/PongNoFrameskip-v4 with DQN. It gives me around ~(-20 to -21) even after 1.5e7 timesteps. I tried various parameters for DQN still it gives me the same output. I could not find proper hyper-parameters for DQN. I think it is a problem with DQN.

Additional context

In the beginning Training of agent start with around -20.4 to -20.2. After 3e6 timesteps it reaches -21 and then it fluctuates in a range between -20.8 to -21.

I tried the following varieties of DQN in which I experimented with different combination of following:
learning_starts in [default-50k,5k,100k],
gamma [0.98,0.99,0.999],
exploration_final_eps [0.02,0.05],
learning_rate [1e-3,1e-4,5e-4] and
buffer_size [50k,500k,1000k].

Above combination is applied into below code.

model = DQN('CnnPolicy',env,verbose=1,learning_starts=50000,gamma=0.98,exploration_final_eps=0.02,learning_rate=1e-3)
model.learn(total_timesteps=int(1.5e7),log_interval=10)

Since I already tried the above mentions combinations, I tend to think to have a bug in DQN implementation.

Checklist

  • I have read the documentation (required)
  • I have checked that there is no similar issue in the repo (required)

Metadata

Metadata

Assignees

No one assigned

    Labels

    RTFMAnswer is the documentationquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions