[Question] replay_buffer.sample #2124
Labels
check the checklist
You have checked the required items in the checklist but you didn't do what is written...
more information needed
Please fill the issue template completely
question
Further information is requested
Uh oh!
There was an error while loading. Please reload this page.
❓ Question
I have encountered some confusion regarding the use of replay_buffer.sample in the Soft Actor-Critic (SAC) implementation within Stable-Baselines3. During my debugging process, I observed that the number of parallel computing environments is set to 4, and each environment is allocated a buffer length of 100,000 // 4 = 25,000. In the for loop, which iterates for gradient_steps times, the training process consistently samples data from the same buffer, specifically using the first self.pos entries for training. The value of self.pos appears to be approximately 208, which implies that a significant portion of the buffer—specifically, 25,000 - 208 entries—remains unused and untrained on.
I am uncertain whether my understanding is correct, as I have not been able to find detailed explanations online regarding the usage of replay_buffer.sample in Stable-Baselines3. I would greatly appreciate your assistance in clarifying this matter. Thank you! @davidsblom @cool-RR @hughperkins @chunky @Gregwar
Checklist
The text was updated successfully, but these errors were encountered: