Description
Name and Version
version: 147 (ec9e030)
built with Apple clang version 15.0.0 (clang-1500.0.40.1) for arm64-apple-darwin24.5.0
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
llama-server -m qwen2.5-vl-7b.gguf --mmproj mmproj-qwen2.5-vl-7b.gguf --host 0.0.0.0 -b 32
set the batch param to 32
Problem description & steps to reproduce
main: server is listening on http://0.0.0.0:8080 - starting the main loop
srv update_slots: all slots are idle
srv params_from_: Chat format: Content-only
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 99
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 4, n_tokens = 4, progress = 0.040404
slot update_slots: id 0 | task 0 | kv cache rm [4, end)
encoding image slice...
srv process_chun: processing image...
image slice encoded in 24194 ms
decoding image batch 1/10, n_tokens_batch = 64
image decoded (batch 1/10) in 456 ms
decoding image batch 2/10, n_tokens_batch = 64
image decoded (batch 2/10) in 335 ms
decoding image batch 3/10, n_tokens_batch = 64
image decoded (batch 3/10) in 336 ms
decoding image batch 4/10, n_tokens_batch = 64
image decoded (batch 4/10) in 338 ms
decoding image batch 5/10, n_tokens_batch = 64
image decoded (batch 5/10) in 339 ms
decoding image batch 6/10, n_tokens_batch = 64
image decoded (batch 6/10) in 338 ms
decoding image batch 7/10, n_tokens_batch = 64
image decoded (batch 7/10) in 340 ms
decoding image batch 8/10, n_tokens_batch = 64
image decoded (batch 8/10) in 340 ms
decoding image batch 9/10, n_tokens_batch = 64
image decoded (batch 9/10) in 342 ms
decoding image batch 10/10, n_tokens_batch = 53
image decoded (batch 10/10) in 339 ms
srv process_chun: image processed in 27698 ms
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 69, n_tokens = 64, progress = 0.696970
slot update_slots: id 0 | task 0 | kv cache rm [69, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 99, n_tokens = 30, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 99, n_tokens = 30
slot release: id 0 | task 0 | stop processing: n_past = 205, truncated = 0
slot print_timing: id 0 | task 0 |
prompt eval time = 28943.24 ms / 99 tokens ( 292.36 ms per token, 3.42 tokens per second)
eval time = 9509.81 ms / 107 tokens ( 88.88 ms per token, 11.25 tokens per second)
total time = 38453.05 ms / 206 tokens
srv update_slots: all slots are idle
the n_tokens_batch is always 64
First Bad Commit
No response