Open
Description
See vllm-project/vllm#1002, vllm-project/vllm#5191.
Should be able to set gguf
as QUANTIZATION
envar, but we also need to provide exact quant. I'm thinking of some MODEL_FILENAME
envar containing the exact filename in the model's repository. The model download logic shall be changed, see https://github.com/Isotr0py/vllm/blob/main/examples/gguf_inference.py.
Metadata
Metadata
Assignees
Labels
No labels