Open
Description
🐛 Bug
Description
When using SubprocVecEnv
with multiple environments, all subprocesses ignore the GPU device specified for the main process and default to GPU 0, regardless of which GPU was set with torch.cuda.set_device()
.
Reproduction
I created a minimal reproduction script that clearly shows the issue:
import os
import torch
import numpy as np
import gymnasium as gym
from stable_baselines3.common.vec_env import DummyVecEnv, SubprocVecEnv
class GPUTestEnv(gym.Env):
def __init__(self, env_id=0):
super().__init__()
self.observation_space = gym.spaces.Box(low=-1, high=1, shape=(4,), dtype=np.float32)
self.action_space = gym.spaces.Box(low=-1, high=1, shape=(2,), dtype=np.float32)
self.env_id = env_id
if torch.cuda.is_available():
device = torch.cuda.current_device()
device_name = torch.cuda.get_device_name(device)
print(f"Env {self.env_id} created on GPU {device} ({device_name}) - PID: {os.getpid()}")
def reset(self, seed=None, options=None):
if torch.cuda.is_available():
test_tensor = torch.ones(1, device="cuda")
device_id = test_tensor.device.index
print(f"Env {self.env_id} - reset() using GPU: {device_id}")
return np.zeros(4, dtype=np.float32), {}
def step(self, action):
if torch.cuda.is_available():
test_tensor = torch.ones(1, device="cuda")
device_id = test_tensor.device.index
print(f"Env {self.env_id} - step() using GPU: {device_id}")
return np.zeros(4, dtype=np.float32), 0.0, False, False, {}
# Specify GPU 3
torch.cuda.set_device(3)
print(f"Main process current device: {torch.cuda.current_device()}")
# Create environments
env_fns = [lambda idx=i: GPUTestEnv(idx) for i in range(3)]
# DummyVecEnv correctly uses GPU 3
print("\n----- Testing DummyVecEnv -----")
dummy_env = DummyVecEnv(env_fns)
dummy_env.reset()
dummy_env.step(np.zeros((3, 2)))
dummy_env.close()
# SubprocVecEnv incorrectly uses GPU 0
print("\n----- Testing SubprocVecEnv -----")
subproc_env = SubprocVecEnv(env_fns, start_method="spawn")
subproc_env.reset()
subproc_env.step(np.zeros((3, 2)))
subproc_env.close()
Output
Main process current device: 3
----- Testing DummyVecEnv -----
Env 0 created on GPU 3 (NVIDIA A100 80GB PCIe) - PID: 1795905
Env 1 created on GPU 3 (NVIDIA A100 80GB PCIe) - PID: 1795905
Env 2 created on GPU 3 (NVIDIA A100 80GB PCIe) - PID: 1795905
Env 0 - reset() using GPU: 3
Env 1 - reset() using GPU: 3
Env 2 - reset() using GPU: 3
Env 0 - step() using GPU: 3
Env 1 - step() using GPU: 3
Env 2 - step() using GPU: 3
----- Testing SubprocVecEnv -----
Env 0 created on GPU 0 (NVIDIA A100 80GB PCIe) - PID: 1796000
Env 0 - reset() using GPU: 0
Env 0 - step() using GPU: 0
Env 2 created on GPU 0 (NVIDIA A100 80GB PCIe) - PID: 1796002
Env 2 - reset() using GPU: 0
Env 2 - step() using GPU: 0
Env 1 created on GPU 0 (NVIDIA A100 80GB PCIe) - PID: 1796001
Env 1 - reset() using GPU: 0
Env 1 - step() using GPU: 0
Environment
- Stable Baselines 3 version: 2.5.0
- PyTorch: 2.6.0+cu124
- CUDA: 12.4
- GPUs: 4x NVIDIA A100 80GB PCIe
- gymnasium:1.0.0
Checklist
- My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
- I have checked that there is no similar issue in the repo
- I have read the documentation
- I have provided a minimal and working example to reproduce the bug
- I've used the markdown code blocks for both code and stack traces.