Description
What happened?
Not sure if this is a problem with me or ik_llama - but getting this while starting prompt processing (ubergarm's deepseek-v3)
May 15 08:57:29 red llama-swap[80783]: INFO [ launch_slot_with_task] slot is processing task | tid="139638925832192" timestamp=1747317449 id_slot=0 id_task=3
May 15 08:57:29 red llama-swap[80783]: INFO [ update_slots] kv cache rm [p0, end) | tid="139638925832192" timestamp=1747317449 id_slot=0 id_task=3 p0=0
May 15 08:57:36 red kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=80798, name=llama-server, Ch 00000008, intr 00000000. MMU Fault: ENGINE GRAPHICS GPC1 GPCCLIENT_T1_3 faulted @ 0x7e9f_4f200000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
May 15 08:57:36 red llama-swap[80783]: CUDA error: an illegal memory access was encountered
May 15 08:57:36 red llama-swap[80783]: current device: 0, in function ggml_backend_cuda_synchronize at /home/nux/dev/ik_llama.cpp/ggml/src/ggml-cuda.cu:3067
May 15 08:57:36 red llama-swap[80783]: cudaStreamSynchronize(cuda_ctx->stream())
May 15 08:57:36 red llama-swap[80783]: /home/nux/dev/ik_llama.cpp/ggml/src/ggml-cuda.cu:110: CUDA error
May 15 08:57:36 red kernel: llama-server[80906]: segfault at 204803fe0 ip 00007f00399189d7 sp 00007ffc4a6104f0 error 4 in libcuda.so.575.51.03[7f00395c5000+e97000] likely on CPU 11 (core 11, socket 0)
May 15 08:57:36 red kernel: Code: ef e8 9d c9 ca ff 83 3d 7e 57 2f 05 01 49 8b 1c 24 76 0a 8b 05 86 57 2f 05 85 c0 74 56 49 8b 44 24 10 41 8b 4c 24 24 48 8b 13 <8b> 00 41 39 c6 74 52 8b b3 40 40 00 00 48 89 f0 89 8c b3 44 40 00
Name and Version
./build/bin/llama-server --version
version: 3697 (34ae71c)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
No response