-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: use cudaMalloc to allocate kvCache #3303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use cudaMalloc to allocate kvCache #3303
Conversation
/bot run --add-multi-gpu-test |
PR_Github #1218 [ run ] triggered by Bot |
PR_Github #1218 [ run ] completed with state |
/bot run --add-multi-gpu-test --disable-fail-fast |
PR_Github #1226 [ run ] triggered by Bot |
PR_Github #1226 [ run ] completed with state |
fec79bd
to
c691394
Compare
/bot run --add-multi-gpu-test --disable-fail-fast |
PR_Github #1247 [ run ] triggered by Bot |
PR_Github #1247 [ run ] completed with state |
6a0dde7
to
7e00b7a
Compare
/bot run --add-multi-gpu-test --disable-fail-fast |
PR_Github #1290 [ run ] triggered by Bot |
PR_Github #1290 [ run ] completed with state |
Signed-off-by: Chuang Zhu <[email protected]>
Signed-off-by: Chuang Zhu <[email protected]>
Signed-off-by: Chuang Zhu <[email protected]>
7e00b7a
to
70f5745
Compare
/bot reuse-pipeline |
PR_Github #1389 [ reuse-pipeline ] triggered by Bot |
PR_Github #1389 [ reuse-pipeline ] completed with state |
Signed-off-by: sarattha <[email protected]>
@chuangz0 this change seems to be pretty general, and is not limited to disaggregated serving. Did we assess that using |
use cudaMalloc to allocate kvCache , so kvCache pool can be register by transfer enigne