Closed
Description
After the generation starts, the server crashes. This only happens on the Qwen3-30B-A3B, and I checked different quant. Regular dense models work, including other dense qwen3.
What could be the problem? I liked the acceleration in dense models, I thought moe would fly.
But it doesn't work. It crashes without an error, it just goes to the command line when generation starts.
win10, Microsoft Visual Studio\2022, main branch
cmake -B ./build -DGGML_CUDA=OFF -DGGML_BLAS=OFF
cmake --build ./build --config Release -j 16
./llama-server.exe -t 7 -c 4096 -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf
Metadata
Metadata
Assignees
Labels
No labels