Make prompt cache saving and restoring MLA aware #497

saood06 · 2025-06-06T08:30:41Z

Tested working with both a long (3.5K tokens) and a short prompt with both matching up in size with what is expected. The long prompt was also tested on a fresh launch of the server to ensure it gave output consistent with what would be expected given the information in the prompt.

Closes #436

…store_fix

Make prompt cache saving and restoring MLA aware (ikawrakow#497)

Kawrakow and others added 6 commits May 28, 2025 13:43

Remove kv_l, kvt_l and just use k_l and v_l

edd049b

Hopefully take care of missing V cache (MLA)

ac27355

Fix save and restore when there is no V cache

31c20e2

Fix double print

ae3816e

Update write_kv_cache_data and read_kv_cache_data to be MLA aware

016163d

Merge remote-tracking branch 'origin/main' into s6/MLA_prompt_save_re…

b1ad163

…store_fix

saood06 requested a review from ikawrakow June 6, 2025 08:33

ikawrakow approved these changes Jun 6, 2025

View reviewed changes

ikawrakow merged commit ffd87f2 into main Jun 6, 2025

Thireus added a commit to Thireus/ik_llama.cpp that referenced this pull request Jun 6, 2025

Merge pull request #5 from ikawrakow/main

929c412

Make prompt cache saving and restoring MLA aware (ikawrakow#497)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make prompt cache saving and restoring MLA aware #497

Make prompt cache saving and restoring MLA aware #497

Uh oh!

saood06 commented Jun 6, 2025

Uh oh!

Uh oh!

Make prompt cache saving and restoring MLA aware #497

Make prompt cache saving and restoring MLA aware #497

Uh oh!

Conversation

saood06 commented Jun 6, 2025

Uh oh!

Uh oh!