Misc. bug: KV defrag bug: nf != nh

### Name and Version

b5595 3a077146a4761fdbd24bdd8eb098f46b8adc4dda 
b5600 d17a809ef0af09b16625e991a76f6fe80d9c332e
(with CUDA)

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
llama-server -m Qwen2.5-14B-Instruct-Q8_0.gguf -ngl 99 --temp 0 -fa -cb -c 44200 -np 17

llama-server -m Qwen2.5-1.5B-Instruct-Q8_0.gguf -ngl 99 --temp 0 -fa -cb -c 166400 -np 64
```

### Problem description & steps to reproduce

This assertion fails sporadically: GGML_ASSERT(nf == nh && "KV defrag bug: nf != nh")
It works fine for 2k or even 50k inference tasks that were completed in parallel, then it randomly fails.
Prompt sizes are roughly in the range from 100 to 600 tokens, and the generated tokens somewhere between 8 and 2k.

I've added debug output. Maybe these numbers yield a clue regarding what failed.
nf != nh (1681 != 1704)
i0: 1194, nh: 1704, nf: 1681, is: 1194, n_used: 2898, n_kv: 12260
Expected n_used: 2898, actual: 2875 (based on checking cells.is_empty for 0 to n_kv)
is_empty is true for cells 1194 to 2897 (others have not been checked)

### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: KV defrag bug: nf != nh #14059

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: KV defrag bug: nf != nh #14059

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions