Is there a way to pass arguments to a backend? (VLLM specifically) #4313
Unanswered
Jordanb716
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to run a model through VLLM, and getting:
err=ValueError('Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA P102-100 GPU has compute capability 6.1. You can use float16 instead by explicitly setting the
dtypeflag in CLI, for example: --dtype=half.')
But I can't for the life of me figure out how to pass that flag to VLLM. Is there something I could add to the model config file, an env variable, or something like that? I'm running v2.23.0-cublas-cuda12-ffmpeg through kubernetes.
Beta Was this translation helpful? Give feedback.
All reactions