Skip to content

[Question] replay_buffer.sample #2124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
TwiceMao opened this issue Apr 24, 2025 · 2 comments
Closed
4 tasks done

[Question] replay_buffer.sample #2124

TwiceMao opened this issue Apr 24, 2025 · 2 comments
Labels
check the checklist You have checked the required items in the checklist but you didn't do what is written... more information needed Please fill the issue template completely question Further information is requested

Comments

@TwiceMao
Copy link

TwiceMao commented Apr 24, 2025

❓ Question

I have encountered some confusion regarding the use of replay_buffer.sample in the Soft Actor-Critic (SAC) implementation within Stable-Baselines3. During my debugging process, I observed that the number of parallel computing environments is set to 4, and each environment is allocated a buffer length of 100,000 // 4 = 25,000. In the for loop, which iterates for gradient_steps times, the training process consistently samples data from the same buffer, specifically using the first self.pos entries for training. The value of self.pos appears to be approximately 208, which implies that a significant portion of the buffer—specifically, 25,000 - 208 entries—remains unused and untrained on.

I am uncertain whether my understanding is correct, as I have not been able to find detailed explanations online regarding the usage of replay_buffer.sample in Stable-Baselines3. I would greatly appreciate your assistance in clarifying this matter. Thank you! @davidsblom @cool-RR @hughperkins @chunky @Gregwar

train_procs: 4 
batch_size=256,  # 1024, 400
ent_coef="auto_0.2",
gamma=0.98,
train_freq=16,  # 4, 64
gradient_steps=16,  # 2, 4
buffer_size=100000,
learning_starts=800,

from stable_baselines3.common.buffers import ReplayBuffer

def train(self, gradient_steps: int, batch_size: int = 64) -> None:
      self.replay_buffer.sample(batch_size, env=self._vec_normalize_env)

Checklist

@TwiceMao TwiceMao added the question Further information is requested label Apr 24, 2025
@araffin araffin added more information needed Please fill the issue template completely check the checklist You have checked the required items in the checklist but you didn't do what is written... labels Apr 24, 2025
@araffin
Copy link
Member

araffin commented Apr 24, 2025

set to 4, and each environment is allocated a buffer length of 100,000 // 4 = 25,000.

correct

the training process consistently samples data from the same buffer, specifically using the first self.pos entries for training.

def sample(self, batch_size: int, env: Optional[VecNormalize] = None):
"""
:param batch_size: Number of element to sample
:param env: associated gym VecEnv
to normalize the observations/rewards when sampling
:return:
"""
upper_bound = self.buffer_size if self.full else self.pos
batch_inds = np.random.randint(0, upper_bound, size=batch_size)
return self._get_samples(batch_inds, env=env)

and

def _get_samples(self, batch_inds: np.ndarray, env: Optional[VecNormalize] = None) -> ReplayBufferSamples:
# Sample randomly the env idx
env_indices = np.random.randint(0, high=self.n_envs, size=(len(batch_inds),))

It should sample the complete buffer unless the buffer is not full yet.

the training process consistently samples data from the same buffer, specifically using the first self.pos entries for training.

Could you provide a minimal and working example that reproduces this behavior?

@TwiceMao
Copy link
Author

TwiceMao commented Apr 27, 2025

the training process consistently samples data from the same buffer, specifically using the first self.pos entries for training.

Because I only ran the first part of the program when debugging, but set the buffer_size relatively large, I observed that the first few elements of the buffer had not changed. I thought it was the same buffer all the time, but now I know that it was because the buffer was not full yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
check the checklist You have checked the required items in the checklist but you didn't do what is written... more information needed Please fill the issue template completely question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants