[QST] [PyTorch] Is it Possible to Cast an RMM Stream to torch.cuda.Stream()? #1831

JigaoLuo · 2025-02-20T15:48:49Z

JigaoLuo
Feb 20, 2025

This is also potentially a PyTorch-related question.

I'm aware that we can use RMM with PyTorch for efficient memory allocation. I also know that it's possible to create a stream in python via rmm.pylibrmm.stream. Moreover in C++ RMM, there's even rmm::cuda_stream_pool for the efficient utilization of streams.

This leads me to wonder if it's possible to create an RMM stream (which is essentially a cudaStream_t under the hood) and then convert it to a PyTorch stream.
And furthermore, in the Python world, is there something planed in future to be similar to rmm::cuda_stream_pool that PyTorch users could also benefit from in the form of a stream pool?

I did check around inside this repo but only found pytorch with RMM for memory allocations: https://github.com/rapidsai/rmm/blob/branch-25.04/python/rmm/rmm/tests/test_rmm_pytorch.py

Matt711 · 2025-02-20T16:23:18Z

Matt711
Feb 20, 2025
Collaborator

Ultimately I think we need access to the cudaStream_t to do the conversion. See numba and cupy.

This comment in torch suggest that you can access it from this CUDAStream object. I'm not sure how much of that's exposed to python though.

0 replies

JigaoLuo · 2025-02-20T22:22:10Z

JigaoLuo
Feb 20, 2025
Author

@Matt711
Thank you for the helpful hint. So far, I've managed to implement a workaround by converting a PyTorch stream into an RMM stream, and it has been functioning properly on my end. For the benefit of PyTorch users, it might be nice to have this particular conversion into the RMM tests as well.

torch_stream = torch.cuda.Stream(device=device)
cupy_stream = cupy.cuda.ExternalStream(torch_stream.cuda_stream)
rmm_stream = rmm.pylibrmm.stream.Stream(cupy_stream)
print(rmm_stream)
print(rmm_stream.is_default())
rmm_stream.synchronize()
d_buffer = rmm.DeviceBuffer(size=10, stream=rmm_stream)

Output:

<rmm.pylibrmm.stream.Stream object at 0x7f7bab543780>
False

0 replies

JigaoLuo · 2025-02-20T22:32:20Z

JigaoLuo
Feb 20, 2025
Author

@Matt711

One small but useful feature that I propose is to replicate the behavior of CuPy and PyTorch in accepting an int representing a pointer to a cudaStream_t to create a stream. I demonstrated this approach in the previous example, and it's also a functionality available as ExternalStream in both CuPy and PyTorch: https://pytorch.org/docs/stable/generated/torch.cuda.ExternalStream.html & https://docs.cupy.dev/en/latest/reference/generated/cupy.cuda.ExternalStream.html
Implementing such a feature would eliminate the need for an intermediate conversion to CuPy as above.

Another feature that I believe is essential and should mimic the behavior of CuPy and PyTorch is the ability to export the stream pointer as an int: https://github.com/pytorch/pytorch/blob/v2.6.0/torch/cuda/streams.py#L101-L103
I noticed that there is no public method available in RMM to obtain the pointer of a cudaStream_t in any format.

0 replies

harrism · 2025-02-21T03:18:07Z

harrism
Feb 21, 2025
Collaborator

@leofang can you comment on how cuda.core is aiming to standardize cross-library stream references in Python?

0 replies

bdice · 2025-02-21T04:19:45Z

bdice
Feb 21, 2025
Collaborator

The relevant pieces from cuda.core are discussed/documented in these links:

I am planning to overhaul RMM's Python/Cython interface to improve interoperability for vocabulary types like CUDA streams that should be usable across libraries. I want to consolidate rmm.pylibrmm.cuda_stream.CudaStream and rmm.pylibrmm.stream.Stream, and make the object support the stream protocols defined by cuda.core. There is some API reworking that needs to happen at several levels (C++, Cython, Python) so I am holding off on making significant changes until I can get a few other items resolved. (Currently rmm.pylibrmm is entirely undocumented.)

0 replies

JigaoLuo · 2025-02-23T20:42:03Z

JigaoLuo
Feb 23, 2025
Author

As an end-user of both RMM and PyTorch, I posed the question regarding interoperable streams between PyTorch and RMM as #1829 . The reason is that both the RMM and PyTorch APIs have their own stream implementations, and I'm interested in exploring the possibility of converting between PyTorch streams and RMM streams. I'm aware that currently, rmm.pylibrmm is not fully in documentation. I suspect that rmm stream will eventually be integrated upstream into cudf as a stream implementation within Python's cudf library.

I'll keep this discussion open until I receive word from NVIDIA indicating that it can be closed. Initially, my question was from the perspective of a high-level user of PyTorch and RMM, but now it has delved into the underlying details of cuda-python.

4 replies

bdice Feb 23, 2025
Collaborator

We currently plan to keep a stream class in RMM. cuDF would use RMM’s stream classes just as it uses memory resources from RMM.

The lack of strong interoperability between stream objects in RMM and PyTorch (and other libraries like CuPy) is something we plan to resolve. The CUDA stream protocol being designed for cuda.core Python code is the best long term solution I see right now. Soon I am going to do an inventory of all “vocabulary types” in RMM, which includes streams, and figure out how to ensure those vocabulary types are interoperable with the rest of the C++ and Python ecosystem.

JigaoLuo Feb 23, 2025
Author

Thank you. Given that the topic has evolved, I'm uncertain about which solution to select and when to close this discussion. Therefore, please let me know when you think that this discussion can be closed.

harrism Feb 24, 2025
Collaborator

BTW, in the past there was a push to add more vocab types to RMM, such as CUDA Events, but we resisted, because we felt that CUDA Core Libraries are the more appropriate home for this. As such, I think that when CUDA Core C++ and Python libraries provide sufficiently standard and interoperable vocab types (streams, events, memory resources), RAPIDS should move to adopt them and retire RMM's equivalents.

JigaoLuo Feb 24, 2025
Author

Thank you. One aspect that would be highly beneficial for end-users is the ability to adapt the stream to external projects like PyTorch.
PyTorch provides an interface that can accept streams created outside of PyTorch: https://pytorch.org/docs/stable/generated/torch.cuda.ExternalStream.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] [PyTorch] Is it Possible to Cast an RMM Stream to torch.cuda.Stream()? #1831

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

[QST] [PyTorch] Is it Possible to Cast an RMM Stream to torch.cuda.Stream()? #1831

JigaoLuo Feb 20, 2025

Replies: 6 comments · 4 replies

Matt711 Feb 20, 2025 Collaborator

JigaoLuo Feb 20, 2025 Author

JigaoLuo Feb 20, 2025 Author

harrism Feb 21, 2025 Collaborator

bdice Feb 21, 2025 Collaborator

JigaoLuo Feb 23, 2025 Author

bdice Feb 23, 2025 Collaborator

JigaoLuo Feb 23, 2025 Author

harrism Feb 24, 2025 Collaborator

JigaoLuo Feb 24, 2025 Author

JigaoLuo
Feb 20, 2025

Replies: 6 comments 4 replies

Matt711
Feb 20, 2025
Collaborator

JigaoLuo
Feb 20, 2025
Author

JigaoLuo
Feb 20, 2025
Author

harrism
Feb 21, 2025
Collaborator

bdice
Feb 21, 2025
Collaborator

JigaoLuo
Feb 23, 2025
Author

bdice Feb 23, 2025
Collaborator

JigaoLuo Feb 23, 2025
Author

harrism Feb 24, 2025
Collaborator

JigaoLuo Feb 24, 2025
Author