feat: use cudaMalloc to allocate kvCache #3303

chuangz0 · 2025-04-06T05:34:20Z

use cudaMalloc to allocate kvCache , so kvCache pool can be register by transfer enigne

chuangz0 · 2025-04-06T05:35:04Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-04-06T05:40:52Z

PR_Github #1218 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-06T07:18:57Z

PR_Github #1218 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #913 completed with status: 'FAILURE'

chuangz0 · 2025-04-06T11:54:34Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-04-06T11:59:30Z

PR_Github #1226 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-06T18:05:25Z

PR_Github #1226 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #919 completed with status: 'FAILURE'

chuangz0 · 2025-04-07T01:21:07Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-04-07T01:26:26Z

PR_Github #1247 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-07T07:17:55Z

PR_Github #1247 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #939 completed with status: 'FAILURE'

chuangz0 · 2025-04-07T07:38:58Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-04-07T07:44:27Z

PR_Github #1290 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-07T13:19:25Z

PR_Github #1290 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #972 completed with status: 'SUCCESS'

Signed-off-by: Chuang Zhu <[email protected]>

Shixiaowei02 · 2025-04-08T02:47:04Z

/bot reuse-pipeline

tensorrt-cicd · 2025-04-08T02:52:27Z

PR_Github #1389 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd · 2025-04-08T02:59:12Z

PR_Github #1389 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #1290 for commit 70f5745

Signed-off-by: sarattha <[email protected]>

schetlur-nv · 2025-04-09T18:13:38Z

@chuangz0 this change seems to be pretty general, and is not limited to disaggregated serving. Did we assess that using cudaMalloc does not affect other use-cases?

chuangz0 requested a review from Shixiaowei02 April 6, 2025 05:35

juney-nvidia changed the title ~~use cudaMalloc to allocate kvCache~~ feat: use cudaMalloc to allocate kvCache Apr 6, 2025

chuangz0 force-pushed the use_cudaMalloc_allocate_kvCache branch from fec79bd to c691394 Compare April 7, 2025 01:18

chuangz0 force-pushed the use_cudaMalloc_allocate_kvCache branch from 6a0dde7 to 7e00b7a Compare April 7, 2025 07:38

Shixiaowei02 approved these changes Apr 8, 2025

View reviewed changes

chuangz0 added 3 commits April 8, 2025 10:46

use cudaMalloc to allocate kvCache

01d12f0

Signed-off-by: Chuang Zhu <[email protected]>

disable pool used test

e598aa2

Signed-off-by: Chuang Zhu <[email protected]>

llm shutdown

70f5745

Signed-off-by: Chuang Zhu <[email protected]>

Shixiaowei02 force-pushed the use_cudaMalloc_allocate_kvCache branch from 7e00b7a to 70f5745 Compare April 8, 2025 02:46

Shixiaowei02 enabled auto-merge (squash) April 8, 2025 02:47

Shixiaowei02 merged commit 1c88af1 into NVIDIA:main Apr 8, 2025
2 checks passed

sarattha pushed a commit to sarattha/TensorRT-LLM that referenced this pull request Apr 9, 2025

feat: use cudaMalloc to allocate kvCache (NVIDIA#3303)

7687eac

Signed-off-by: sarattha <[email protected]>

tomeras91 pushed a commit to tomeras91/TensorRT-LLM that referenced this pull request Apr 9, 2025

feat: use cudaMalloc to allocate kvCache (NVIDIA#3303)

22fe1d6

tomeras91 pushed a commit to tomeras91/TensorRT-LLM that referenced this pull request Apr 9, 2025

feat: use cudaMalloc to allocate kvCache (NVIDIA#3303)

cefd239

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use cudaMalloc to allocate kvCache #3303

feat: use cudaMalloc to allocate kvCache #3303

chuangz0 commented Apr 6, 2025

chuangz0 commented Apr 6, 2025

tensorrt-cicd commented Apr 6, 2025

tensorrt-cicd commented Apr 6, 2025

chuangz0 commented Apr 6, 2025

tensorrt-cicd commented Apr 6, 2025

tensorrt-cicd commented Apr 6, 2025

chuangz0 commented Apr 7, 2025

tensorrt-cicd commented Apr 7, 2025

tensorrt-cicd commented Apr 7, 2025

chuangz0 commented Apr 7, 2025

tensorrt-cicd commented Apr 7, 2025

tensorrt-cicd commented Apr 7, 2025

Shixiaowei02 commented Apr 8, 2025

tensorrt-cicd commented Apr 8, 2025

tensorrt-cicd commented Apr 8, 2025

schetlur-nv commented Apr 9, 2025

feat: use cudaMalloc to allocate kvCache #3303

feat: use cudaMalloc to allocate kvCache #3303

Conversation

chuangz0 commented Apr 6, 2025

chuangz0 commented Apr 6, 2025

tensorrt-cicd commented Apr 6, 2025

tensorrt-cicd commented Apr 6, 2025

chuangz0 commented Apr 6, 2025

tensorrt-cicd commented Apr 6, 2025

tensorrt-cicd commented Apr 6, 2025

chuangz0 commented Apr 7, 2025

tensorrt-cicd commented Apr 7, 2025

tensorrt-cicd commented Apr 7, 2025

chuangz0 commented Apr 7, 2025

tensorrt-cicd commented Apr 7, 2025

tensorrt-cicd commented Apr 7, 2025

Shixiaowei02 commented Apr 8, 2025

tensorrt-cicd commented Apr 8, 2025

tensorrt-cicd commented Apr 8, 2025

schetlur-nv commented Apr 9, 2025