Skip to content

Drop at the start of generation #380

Closed
@intulint

Description

@intulint

After the generation starts, the server crashes. This only happens on the Qwen3-30B-A3B, and I checked different quant. Regular dense models work, including other dense qwen3.
What could be the problem? I liked the acceleration in dense models, I thought moe would fly.
But it doesn't work. It crashes without an error, it just goes to the command line when generation starts.

win10, Microsoft Visual Studio\2022, main branch

cmake -B ./build -DGGML_CUDA=OFF -DGGML_BLAS=OFF
cmake --build ./build --config Release -j 16

./llama-server.exe -t 7 -c 4096 -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions