-
Notifications
You must be signed in to change notification settings - Fork 364
🐛 [Bug] Loading Torch-TensorRT models (.ts) on multiple GPUs (in TorchServe) #1888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@gs-olive can you try to replicate this? |
Hello - I tried the following minimal example to reproduce the error:
While I was unable to reproduce the exact error as described, I did notice that the compiled model would only return results stored on GPU 0 (the GPU index which it was compiled with), and not other GPUs of the same type with other indices. This is an issue on our end, which I am looking into. Based on this, it might make sense to try recompiling the model for each unique GPU ID, and saving the models as "model_gpu0.ts", "model_gpu1.ts",..., as a temporary workaround, and to see if this resolves the issue. I will also continue trying to reproduce the |
Very much appreciate you looking into this and thanks for the suggested workaround! 🙌 |
I met the same issue! Both A100 and 3090 have the same Ampere architecture. |
For further context, I used the same docker image (nvcr.io/nvidia/pytorch:22.12-py3) to compile and run the model, but it was compiled on Ampere RTX A6000 and run on A10. As mentioned earlier, it worked well with one GPU, but not with a multi-gpu configuration. |
Thank you both for the follow-up. After corresponding with @narendasan on this, the reason for which compiling the model on A100 and instantiating on 3090 is an issue is due to the difference in compute capability (A100 having Compute Capability 8.0 and 3090 having Compute Capability 8.6, source). As of TensorRT 8.6, there is a newly added support for Hardware Compatibility, which should resolve this issue once we add support for the feature in Torch-TensorRT. There is a feature request already filed for this: #1929. |
Thanks for your reply, |
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days |
Hello - as an update on this issue, we recently added #2325 to |
Excellent, thanks for the hard work and update! |
Hello - we recently added #2445 which enables the |
Thanks @gs-olive!! I'm currently low on bandwidth, but I'll give this a spin for my next model! |
Bug Description
Everything works well when I'm using 1 GPU, but as soon as I try to load a model on 4 separate GPUs, I get this error:
MODEL_LOG - RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:42] Expected most_compatible_device to be true but got false
MODEL_LOG - No compatible device was found for instantiating TensorRT engine
To Reproduce
Steps to reproduce the behavior:
Create a (.ts) model and load it on 4 different GPUs. I don't know if this is specific to TorchServe, or a general issue.
Here's the simple version (TorchServe Handler):
I'm not sure if it relates to this issue. From what I can tell it seems like I need to restrict the CUDA context, however, the GPU is assigned in the handler. I tried these things, but it's still giving me the same problem.
I also tried mapping the model straight to the GPU on load, but with the same problem.
Expected behavior
Load a .ts model by specifying the GPU Id without any issues.
Environment
Official PyTorch image: nvcr.io/nvidia/pytorch:22.12-py3
GPUs: 4x NVIDIA A10G
Pytorch: 1.14.0a0+410ce96
NVIDIA CUDA 11.8.0
TensorRT 8.5.1
Ubuntu 20.04 including Python 3.8
NVIDIA CUDA® 11.8.0
NVIDIA cuBLAS 11.11.3.6
NVIDIA cuDNN 8.7.0.84
NVIDIA NCCL 2.15.5 (optimized for NVIDIA NVLink®)
NVIDIA RAPIDS™ 22.10.01 (For x86, only these libraries are included: cudf, xgboost, rmm, cuml, and cugraph.)
Apex
rdma-core 36.0
NVIDIA HPC-X 2.13
OpenMPI 4.1.4+
GDRCopy 2.3
TensorBoard 2.9.0
Nsight Compute 2022.3.0.0
Nsight Systems 2022.4.2.1
NVIDIA TensorRT™ 8.5.1
Torch-TensorRT 1.1.0a0
NVIDIA DALI® 1.20.0
MAGMA 2.6.2
JupyterLab 2.3.2 including Jupyter-TensorBoard
TransformerEngine 0.3.0
Additional context
The text was updated successfully, but these errors were encountered: