Skip to content

Fix bug in MMVQ kernel #446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 23, 2025
Merged

Fix bug in MMVQ kernel #446

merged 1 commit into from
May 23, 2025

Conversation

ikawrakow
Copy link
Owner

After a very long bug hunt, this PR should hopefully fix #389, #398, #425.

Thanks to everybody who tested my previous bug fix attempts!
Huge kudos to @ciprianveg who was instrumental in finding the bug!

The bug was in the CUDA matrix-vector multiplication kernel (a.k.a., MMVQ). It only shows up when the kernel processes 2 or 3 tokens. Hence, it was not observed during TG, and only showed up during PP when an expert in a MoE model ended up with having to process just 2 or 3 tokens from the batch (which is rare).

I believe all other changes I made in #442 are not necessary, but please test this PR to confirm.

Closes #389
Closes #398
Closes #425

@ciprianveg
Copy link

ciprianveg commented May 23, 2025 via email

@schynce
Copy link

schynce commented May 23, 2025

I can happily confirm that this PR seems to have fixed the issues on my end! Thank you!

@ikawrakow
Copy link
Owner Author

I think I'll merge this now. It fixes a real bug, so it should be merged irrespective of it fixing #389, #398, #425.

@ikawrakow ikawrakow merged commit 9fb82af into main May 23, 2025
@Panchovix
Copy link

Amazing, thanks for all your work!

@p4s2wd
Copy link

p4s2wd commented May 24, 2025

Thank you!

@pt13762104
Copy link

It's working fine now, thank you for your patience

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants