Skip to content

🐛 [Bug] Loading Torch-TensorRT models (.ts) on multiple GPUs (in TorchServe) #1888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
emilwallner opened this issue May 5, 2023 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@emilwallner
Copy link

Bug Description

Everything works well when I'm using 1 GPU, but as soon as I try to load a model on 4 separate GPUs, I get this error:

MODEL_LOG - RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:42] Expected most_compatible_device to be true but got false
MODEL_LOG - No compatible device was found for instantiating TensorRT engine

To Reproduce

Steps to reproduce the behavior:

Create a (.ts) model and load it on 4 different GPUs. I don't know if this is specific to TorchServe, or a general issue.

Here's the simple version (TorchServe Handler):

def initialize(self, ctx):
        properties = ctx.system_properties
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
        self.model = torch.jit.load('model.ts')

I'm not sure if it relates to this issue. From what I can tell it seems like I need to restrict the CUDA context, however, the GPU is assigned in the handler. I tried these things, but it's still giving me the same problem.

def initialize(self, ctx):
        properties = ctx.system_properties
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
        torch.cuda.set_device(self.device)
        torch_tensorrt.set_device(int(properties.get("gpu_id")))

        with torch.cuda.device(int(properties.get("gpu_id"))):
              self.model = torch.jit.load('model.ts')
              self.model.to(self.device)
              self.model.eval()

I also tried mapping the model straight to the GPU on load, but with the same problem.

Expected behavior

Load a .ts model by specifying the GPU Id without any issues.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Official PyTorch image: nvcr.io/nvidia/pytorch:22.12-py3
GPUs: 4x NVIDIA A10G
Pytorch: 1.14.0a0+410ce96
NVIDIA CUDA 11.8.0
TensorRT 8.5.1
Ubuntu 20.04 including Python 3.8
NVIDIA CUDA® 11.8.0
NVIDIA cuBLAS 11.11.3.6
NVIDIA cuDNN 8.7.0.84
NVIDIA NCCL 2.15.5 (optimized for NVIDIA NVLink®)
NVIDIA RAPIDS™ 22.10.01 (For x86, only these libraries are included: cudf, xgboost, rmm, cuml, and cugraph.)
Apex
rdma-core 36.0
NVIDIA HPC-X 2.13
OpenMPI 4.1.4+
GDRCopy 2.3
TensorBoard 2.9.0
Nsight Compute 2022.3.0.0
Nsight Systems 2022.4.2.1
NVIDIA TensorRT™ 8.5.1
Torch-TensorRT 1.1.0a0
NVIDIA DALI® 1.20.0
MAGMA 2.6.2
JupyterLab 2.3.2 including Jupyter-TensorBoard
TransformerEngine 0.3.0

Additional context

@emilwallner emilwallner added the bug Something isn't working label May 5, 2023
@narendasan
Copy link
Collaborator

@gs-olive can you try to replicate this?

@gs-olive
Copy link
Collaborator

gs-olive commented May 9, 2023

Hello - I tried the following minimal example to reproduce the error:

  • Compile resnet18 on GPU 0
  • Load two instances of the same saved model (one on GPU 0, another on GPU 1 which is the same type)
  • Run inference with both

While I was unable to reproduce the exact error as described, I did notice that the compiled model would only return results stored on GPU 0 (the GPU index which it was compiled with), and not other GPUs of the same type with other indices. This is an issue on our end, which I am looking into. Based on this, it might make sense to try recompiling the model for each unique GPU ID, and saving the models as "model_gpu0.ts", "model_gpu1.ts",..., as a temporary workaround, and to see if this resolves the issue.

I will also continue trying to reproduce the Expected most_compatible_device to be true but got false error.

@emilwallner
Copy link
Author

Very much appreciate you looking into this and thanks for the suggested workaround! 🙌

@NothingToSay99
Copy link

NothingToSay99 commented May 17, 2023

I met the same issue!
With the same env, using nvidia A100 create the model, then loading it on nvidia 3090, the error
‘’RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:42] Expected most_compatible_device to be true but got false
No compatible device was found for instantiating TensorRT engine‘’
came up.

Both A100 and 3090 have the same Ampere architecture.

@emilwallner
Copy link
Author

For further context, I used the same docker image (nvcr.io/nvidia/pytorch:22.12-py3) to compile and run the model, but it was compiled on Ampere RTX A6000 and run on A10. As mentioned earlier, it worked well with one GPU, but not with a multi-gpu configuration.

@gs-olive
Copy link
Collaborator

Thank you both for the follow-up. After corresponding with @narendasan on this, the reason for which compiling the model on A100 and instantiating on 3090 is an issue is due to the difference in compute capability (A100 having Compute Capability 8.0 and 3090 having Compute Capability 8.6, source).

As of TensorRT 8.6, there is a newly added support for Hardware Compatibility, which should resolve this issue once we add support for the feature in Torch-TensorRT. There is a feature request already filed for this: #1929.

@NothingToSay99
Copy link

NothingToSay99 commented May 18, 2023

Thanks for your reply,
looking forward to your work!

@github-actions
Copy link

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

@gs-olive
Copy link
Collaborator

Hello - as an update on this issue, we recently added #2325 to main which addresses compilation of the model on one GPU and loading on a different (or multiple) GPUs of the same kind. This PR was intended to fix cases where the model would always load to GPU 0. The feature to add hardware compatibility support (build on one GPU, functional on a variety) is still planned for implementation in #1929.

@emilwallner
Copy link
Author

Excellent, thanks for the hard work and update!

@gs-olive
Copy link
Collaborator

Hello - we recently added #2445 which enables the hardware_compatibility feature for TRT Engines generated with ir="torch_compile" or ir="dynamo". If you are able to test out multi-GPU usage with hardware_compatible=True and ir="dynamo" (which also allows serialization via TorchScript), it would be much appreciated

@emilwallner
Copy link
Author

Thanks @gs-olive!! I'm currently low on bandwidth, but I'll give this a spin for my next model!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants