Description
Question
I am training Pong-v4/PongNoFrameskip-v4 with DQN. It gives me around ~(-20 to -21) even after 1.5e7 timesteps. I tried various parameters for DQN still it gives me the same output. I could not find proper hyper-parameters for DQN. I think it is a problem with DQN.
Additional context
In the beginning Training of agent start with around -20.4 to -20.2. After 3e6 timesteps it reaches -21 and then it fluctuates in a range between -20.8 to -21.
I tried the following varieties of DQN in which I experimented with different combination of following:
learning_starts in [default-50k,5k,100k],
gamma [0.98,0.99,0.999],
exploration_final_eps [0.02,0.05],
learning_rate [1e-3,1e-4,5e-4] and
buffer_size [50k,500k,1000k].
Above combination is applied into below code.
model = DQN('CnnPolicy',env,verbose=1,learning_starts=50000,gamma=0.98,exploration_final_eps=0.02,learning_rate=1e-3)
model.learn(total_timesteps=int(1.5e7),log_interval=10)
Since I already tried the above mentions combinations, I tend to think to have a bug in DQN implementation.
Checklist
- I have read the documentation (required)
- I have checked that there is no similar issue in the repo (required)