How to run a llama server with fixed prompt cache without caching each of my upcoming queries? #14282
Answered
by
ggerganov
NIKHILDUGAR
asked this question in
Q&A
-
I am currently running my server as such:-
and pass my prompt as
Now the problem I am facing with this is :
Appreciate any and all help and advice. Thanks. |
Beta Was this translation helpful? Give feedback.
Answered by
ggerganov
Jun 23, 2025
Replies: 1 comment 14 replies
-
After each request, send a dummy request with the original "fixed" prompt and |
Beta Was this translation helpful? Give feedback.
14 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You must be doing something wrong. Here is how you can test it: