DQN is not converging even after 15M timesteps

### Question

I am training Pong-v4/PongNoFrameskip-v4 with DQN. It gives me around ~(-20 to -21) even after 1.5e7 timesteps. I tried various parameters for DQN still it gives me the same output. I could not find proper hyper-parameters for DQN. I think it is a problem with DQN.

### Additional context

In the beginning Training of agent start with around -20.4 to -20.2. After 3e6 timesteps it reaches -21 and then it fluctuates in a range between -20.8 to -21.

I tried the following varieties of DQN in which I experimented with different combination of following: 
learning_starts in  [default-50k,5k,100k], 
gamma [0.98,0.99,0.999], 
exploration_final_eps [0.02,0.05], 
learning_rate [1e-3,1e-4,5e-4] and 
buffer_size [50k,500k,1000k].

Above combination is applied into below code.
```python
model = DQN('CnnPolicy',env,verbose=1,learning_starts=50000,gamma=0.98,exploration_final_eps=0.02,learning_rate=1e-3)
model.learn(total_timesteps=int(1.5e7),log_interval=10)
```
Since I already tried the above mentions combinations, I tend to think to have a bug in DQN implementation.


### Checklist

- [x] I have read the [documentation](https://stable-baselines3.readthedocs.io/en/master/) (**required**)
- [x] I have checked that there is no similar [issue](https://github.com/DLR-RM/stable-baselines3/issues) in the repo (**required**)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DQN is not converging even after 15M timesteps #214

Question

Additional context

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DQN is not converging even after 15M timesteps #214

Description

Question

Additional context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions