Support for LLMLingua

Hi! I was attempting to see if llama.cpp could be supported in LLMLingua (prompt compression) via llama-cpp-python, but it looks like attention masks are required. Attention masks are supported in transformers, and it would seem like they would enable more projects to work with llama.cpp. 

I think that this might be worth pursuing in order to use LLMLingua in downstream projects, since CPU and partial-GPU prompt processing is obviously quite slow, and adds up for longer passages. Additionally, perhaps implementing LLMLingua's methods in llama.cpp is worth consideration?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for LLMLingua #4823

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for LLMLingua #4823

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions