[Announcement] AWQ is now supported in text-generation-inference

Hi,

Thanks to the great work of the authors of AWQ, maintainers at [TGI](https://github.com/huggingface/text-generation-inference), and the open-source community, AWQ is now supported in TGI ([link](https://github.com/huggingface/text-generation-inference/pull/1054)). 

@TheBloke has [released](https://huggingface.co/models?search=-AWQ) many AWQ-quantized models on HuggingFace all of these can be run using TGI as follows:

```
text-generation-launcher \
--model-id TheBloke/Llama-2-7b-Chat-AWQ \
--trust-remote-code --port 8080 \
--max-input-length 3072 --max-total-tokens 4096 --max-batch-prefill-tokens 4096 \
--quantize awq
```

Note, that this PR uses older GEMM kernels from AWQ (commit [f084f40](f084f40bd996f3cf3a0633c1ad7d9d476c318aaa)).

CC: @tonylins, @Sakits

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Announcement] AWQ is now supported in text-generation-inference #92

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Announcement] AWQ is now supported in text-generation-inference #92

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions