Skip to content

Bug: Difficulties Using LLaMa.cpp Server and --prompt-cache [FNAME] (not supported?) #9135

Closed
@darien-schettler

Description

@darien-schettler

What happened?

As seen here:

https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md

The llama.cpp server should support --prompt-cache [FNAME]

I have not been able to get this feature to work.
I have tried workarounds such as using llama-cli to generate the prompt-cache and then specify this file for llama-server.

Is there some minimally reproducible code snippet that shows this feature working? Is it implemented?

Thanks in advance.

Name and Version

CLI Call to generate prompt cache.

version: 3613 (fc54ef0)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

$ ./llama-cli -m "/.../Meta-Llama-3.1-8B-Instruct-Q6_K.gguf" -c 4096 --verbose-prompt -co --mlock -t $(nproc) --prompt-cache "/.../prompt_cache/prompt_cache.bin --prompt-cache-all --file "/.../prompt_files/pirate_prompt.txt"

Server Call (after generating prompt_cache.bin with llama-cli)(this prompt file is the same as the above without the final user input which will be sent via the request).

version: 3613 (fc54ef0)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

$ ./llama-server -m "/.../Meta-Llama-3.1-8B-Instruct-Q6_K.gguf" --host 0.0.0.0 --port 8080 -c 4096 --verbose-prompt -co --mlock -t $(nproc) --prompt-cache "/.../prompt_cache/prompt_cache.bin" --prompt-cache-ro --keep -1 -f "/.../prompt_files/pirate_prompt_server.txt"

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedhigh severityUsed to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)stale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions