Skip to content

Support for LLMLingua #4823

Closed
Closed
@TechnotechGit

Description

@TechnotechGit

Hi! I was attempting to see if llama.cpp could be supported in LLMLingua (prompt compression) via llama-cpp-python, but it looks like attention masks are required. Attention masks are supported in transformers, and it would seem like they would enable more projects to work with llama.cpp.

I think that this might be worth pursuing in order to use LLMLingua in downstream projects, since CPU and partial-GPU prompt processing is obviously quite slow, and adds up for longer passages. Additionally, perhaps implementing LLMLingua's methods in llama.cpp is worth consideration?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions