Support GGUF models

See https://github.com/vllm-project/vllm/issues/1002, https://github.com/vllm-project/vllm/pull/5191. 

Should be able to set `gguf` as `QUANTIZATION` envar, but we also need to provide exact quant. I'm thinking of some `MODEL_FILENAME` envar containing the exact filename in the model's repository. The model download logic shall be changed, see https://github.com/Isotr0py/vllm/blob/main/examples/gguf_inference.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support GGUF models #98

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support GGUF models #98

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions