Skip to content

[bug] on-policy rollout collects current "dones" instead of last "dones" #105

Closed
@AndyShih12

Description

@AndyShih12

Took me a really long time to debug this, so hopefully this helps others out.

Describe the bug
The on-policy rollout collects last_obs, current reward, current dones. See here

In stable-baselines, the rollout collects last_obs, current reward, and last dones. See here

This messes up the returns and advantage calculations.

I fixed this locally, and PPO improved dramatically on my custom environment (red is before fix, green is after fix).
Imgur

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions