Fix bug in MMVQ kernel #446

ikawrakow · 2025-05-23T09:17:19Z

After a very long bug hunt, this PR should hopefully fix #389, #398, #425.

Thanks to everybody who tested my previous bug fix attempts!
Huge kudos to @ciprianveg who was instrumental in finding the bug!

The bug was in the CUDA matrix-vector multiplication kernel (a.k.a., MMVQ). It only shows up when the kernel processes 2 or 3 tokens. Hence, it was not observed during TG, and only showed up during PP when an expert in a MoE model ended up with having to process just 2 or 3 tokens from the batch (which is rare).

I believe all other changes I made in #442 are not necessary, but please test this PR to confirm.

Closes #389
Closes #398
Closes #425

ciprianveg · 2025-05-23T11:29:36Z

Thank you for the fix!🍻

…

On Fri, 23 May 2025, 12:17 Kawrakow, ***@***.***> wrote: After a very long bug hunt, this PR should hopefully fix #389 <#389>, #398 <#398>, #425 <#425>. Thanks to everybody who tested my previous bug fix attempts! Huge kudos to @ciprianveg <https://github.com/ciprianveg> who was instrumental in finding the bug! The bug was in the CUDA matrix-vector multiplication kernel (a.k.a., MMVQ). It only shows up when the kernel processes 2 or 3 tokens. Hence, it was not observed during TG, and only showed up during PP when an expert in a MoE model ended up with having to process just 2 or 3 tokens from the batch (which is rare). I believe all other changes I made in #442 <#442> are not necessary, but please test this PR to confirm. Closes #389 <#389> Closes #398 <#398> Closes #425 <#425> ------------------------------ You can view, comment on, or merge this pull request online at: #446 Commit Summary - 193a15b <193a15b> Fix bug in MMVQ kernel File Changes (1 file <https://github.com/ikawrakow/ik_llama.cpp/pull/446/files>) - *M* ggml/src/ggml-cuda/mmvq.cu <https://github.com/ikawrakow/ik_llama.cpp/pull/446/files#diff-215515d65e174fb02240522a4bb36f5c8f974d129f7a8d1aa6026a4dbd8dff12> (5) Patch Links: - https://github.com/ikawrakow/ik_llama.cpp/pull/446.patch - https://github.com/ikawrakow/ik_llama.cpp/pull/446.diff — Reply to this email directly, view it on GitHub <#446>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJTBYK7WCU4ARPNJHW3ML4D273RTLAVCNFSM6AAAAAB5YG7EMGVHI2DSMVQWIX3LMV43ASLTON2WKOZTGA4DKNZVGEYDGNA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

schynce · 2025-05-23T11:40:44Z

I can happily confirm that this PR seems to have fixed the issues on my end! Thank you!

ikawrakow · 2025-05-23T15:25:05Z

I think I'll merge this now. It fixes a real bug, so it should be merged irrespective of it fixing #389, #398, #425.

Panchovix · 2025-05-23T16:00:18Z

Amazing, thanks for all your work!

p4s2wd · 2025-05-24T05:12:04Z

Thank you!

pt13762104 · 2025-05-24T09:31:08Z

It's working fine now, thank you for your patience

Fix bug in MMVQ kernel

193a15b

ikawrakow merged commit 9fb82af into main May 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bug in MMVQ kernel #446

Fix bug in MMVQ kernel #446

Uh oh!

ikawrakow commented May 23, 2025

Uh oh!

ciprianveg commented May 23, 2025 via email

Uh oh!

schynce commented May 23, 2025

Uh oh!

ikawrakow commented May 23, 2025

Uh oh!

Panchovix commented May 23, 2025

Uh oh!

p4s2wd commented May 24, 2025

Uh oh!

pt13762104 commented May 24, 2025

Uh oh!

Uh oh!

Fix bug in MMVQ kernel #446

Fix bug in MMVQ kernel #446

Uh oh!

Conversation

ikawrakow commented May 23, 2025

Uh oh!

ciprianveg commented May 23, 2025 via email

Uh oh!

schynce commented May 23, 2025

Uh oh!

ikawrakow commented May 23, 2025

Uh oh!

Panchovix commented May 23, 2025

Uh oh!

p4s2wd commented May 24, 2025

Uh oh!

pt13762104 commented May 24, 2025

Uh oh!

Uh oh!