-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix: fix cublas_scaled_mm #3600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/bot run --stage-list "H100_PCIe-PyTorch-1" |
PR_Github #2412 [ run ] triggered by Bot |
PR_Github #2412 [ run ] completed with state |
/bot run --stage-list "H100_PCIe-PyTorch-1" |
PR_Github #2420 [ run ] triggered by Bot |
PR_Github #2420 [ run ] completed with state |
865c7b3
to
1f2dc49
Compare
/bot run --stage-list "H100_PCIe-PyTorch-1" |
PR_Github #2430 [ run ] triggered by Bot |
PR_Github #2430 [ run ] completed with state |
1f2dc49
to
fa2f21c
Compare
/bot run --disable-fail-fast --stage-list "H100_PCIe-PyTorch-1" |
PR_Github #2451 [ run ] triggered by Bot |
PR_Github #3 [ run ] triggered by Bot |
PR_Github #3 [ run ] completed with state |
PR_Github #2451 [ run ] completed with state |
/bot run --disable-fail-fast --stage-list "H100_PCIe-PyTorch-1" |
PR_Github #2453 [ run ] triggered by Bot |
PR_Github #6 [ run ] triggered by Bot |
PR_Github #6 [ run ] completed with state |
/bot run --disable-fail-fast --stage-list "H100_PCIe-PyTorch-1" |
PR_Github #2459 [ run ] triggered by Bot |
PR_Github #2453 [ run ] completed with state |
PR_Github #2459 [ run ] completed with state |
fa2f21c
to
37b5b67
Compare
/bot run --disable-fail-fast --stage-list "H100_PCIe-PyTorch-1" |
PR_Github #2595 [ run ] triggered by Bot |
37b5b67
to
1da6010
Compare
/bot run --disable-fail-fast --stage-list "H100_PCIe-PyTorch-1" |
PR_Github #2595 [ run ] completed with state |
PR_Github #2615 [ run ] triggered by Bot |
PR_Github #2615 [ run ] completed with state |
1da6010
to
567aee8
Compare
/bot run --disable-fail-fast --stage-list "H100_PCIe-PyTorch-1" |
PR_Github #2703 [ run ] triggered by Bot |
PR_Github #2703 [ run ] completed with state |
567aee8
to
e0922aa
Compare
/bot run --disable-fail-fast |
PR_Github #2821 [ run ] triggered by Bot |
PR_Github #2821 [ run ] completed with state |
Signed-off-by: Zhenhuan Chen <[email protected]>
e0922aa
to
35452ee
Compare
/bot reuse-pipeline |
PR_Github #2910 [ reuse-pipeline ] triggered by Bot |
PR_Github #2910 [ reuse-pipeline ] completed with state |
In pytorch 2.7, cublas's workspace is modified from 32MB to 1MB on Hopper:
_getWorkspaceSize()
rather than_getWorkspace()
pytorch/pytorch@203a27e#diff-74fcb26047c1df4024105d36ce22a36b77cf8cc93c28631d743e639b3d6066aeL1624-R1599_getWorkspace()
will use cublas's workspace size 32MB on Hopper: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cuda/CublasHandlePool.cpp#L133And this lead to different algos (especially when one choose splitK and another not) chosen by cublasLt on some GEMM shapes. So we need to align this in our tests.
In E2E test, there are other tests called torch._scaled_mm, and the workspace size variable is static in cpp code, so the modification in test_scaled_mm won't take effect and we need to add that in test_linear_fp8.py.