[Question] replay_buffer.sample #2124

TwiceMao · 2025-04-24T12:57:00Z

❓ Question

I have encountered some confusion regarding the use of replay_buffer.sample in the Soft Actor-Critic (SAC) implementation within Stable-Baselines3. During my debugging process, I observed that the number of parallel computing environments is set to 4, and each environment is allocated a buffer length of 100,000 // 4 = 25,000. In the for loop, which iterates for gradient_steps times, the training process consistently samples data from the same buffer, specifically using the first self.pos entries for training. The value of self.pos appears to be approximately 208, which implies that a significant portion of the buffer—specifically, 25,000 - 208 entries—remains unused and untrained on.

I am uncertain whether my understanding is correct, as I have not been able to find detailed explanations online regarding the usage of replay_buffer.sample in Stable-Baselines3. I would greatly appreciate your assistance in clarifying this matter. Thank you! @davidsblom @cool-RR @hughperkins @chunky @Gregwar

train_procs: 4 
batch_size=256,  # 1024, 400
ent_coef="auto_0.2",
gamma=0.98,
train_freq=16,  # 4, 64
gradient_steps=16,  # 2, 4
buffer_size=100000,
learning_starts=800,

from stable_baselines3.common.buffers import ReplayBuffer

def train(self, gradient_steps: int, batch_size: int = 64) -> None:
      self.replay_buffer.sample(batch_size, env=self._vec_normalize_env)

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

araffin · 2025-04-24T14:17:23Z

set to 4, and each environment is allocated a buffer length of 100,000 // 4 = 25,000.

correct

the training process consistently samples data from the same buffer, specifically using the first self.pos entries for training.

stable-baselines3/stable_baselines3/common/buffers.py

Lines 106 to 115 in c1e503c

    
               def sample(self, batch_size: int, env: Optional[VecNormalize] = None): 
        
                   """ 
        
                   :param batch_size: Number of element to sample 
        
                   :param env: associated gym VecEnv 
        
                       to normalize the observations/rewards when sampling 
        
                   :return: 
        
                   """ 
        
                   upper_bound = self.buffer_size if self.full else self.pos 
        
                   batch_inds = np.random.randint(0, upper_bound, size=batch_size) 
        
                   return self._get_samples(batch_inds, env=env)

and

stable-baselines3/stable_baselines3/common/buffers.py

Lines 307 to 309 in c1e503c

    
           def _get_samples(self, batch_inds: np.ndarray, env: Optional[VecNormalize] = None) -> ReplayBufferSamples: 
        
               # Sample randomly the env idx 
        
               env_indices = np.random.randint(0, high=self.n_envs, size=(len(batch_inds),))

It should sample the complete buffer unless the buffer is not full yet.

the training process consistently samples data from the same buffer, specifically using the first self.pos entries for training.

Could you provide a minimal and working example that reproduces this behavior?

TwiceMao · 2025-04-27T03:42:03Z

the training process consistently samples data from the same buffer, specifically using the first self.pos entries for training.

Because I only ran the first part of the program when debugging, but set the buffer_size relatively large, I observed that the first few elements of the buffer had not changed. I thought it was the same buffer all the time, but now I know that it was because the buffer was not full yet.

TwiceMao added the question Further information is requested label Apr 24, 2025

araffin added more information needed Please fill the issue template completely check the checklist You have checked the required items in the checklist but you didn't do what is written... labels Apr 24, 2025

TwiceMao closed this as completed Apr 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] replay_buffer.sample #2124

[Question] replay_buffer.sample #2124

TwiceMao commented Apr 24, 2025 •

edited

Loading

araffin commented Apr 24, 2025

Uh oh!

TwiceMao commented Apr 27, 2025 •

edited

Loading

Uh oh!

[Question] replay_buffer.sample #2124

[Question] replay_buffer.sample #2124

Comments

TwiceMao commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❓ Question

Checklist

araffin commented Apr 24, 2025

Uh oh!

TwiceMao commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TwiceMao commented Apr 24, 2025 •

edited

Loading

TwiceMao commented Apr 27, 2025 •

edited

Loading