Skip to content

[Announcement] AWQ is now supported in text-generation-inference #92

Open
@abhinavkulkarni

Description

@abhinavkulkarni

Hi,

Thanks to the great work of the authors of AWQ, maintainers at TGI, and the open-source community, AWQ is now supported in TGI (link).

@TheBloke has released many AWQ-quantized models on HuggingFace all of these can be run using TGI as follows:

text-generation-launcher \
--model-id TheBloke/Llama-2-7b-Chat-AWQ \
--trust-remote-code --port 8080 \
--max-input-length 3072 --max-total-tokens 4096 --max-batch-prefill-tokens 4096 \
--quantize awq

Note, that this PR uses older GEMM kernels from AWQ (commit f084f40).

CC: @tonylins, @Sakits

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions