feat: Support CUDA graphs for EAGLE3 #3176

mikeiovine · 2025-03-31T22:51:19Z

Support CUDA graphs for the EAGLE-3 spec decode. Also contains fixes for loading eagle 3 weights for llama3 70B models (previously only tested for 8b).

The graphs significantly improve the performance. However, we still have a lot of work to do to eliminate the host overheads.

tensorrt_llm/_torch/pyexecutor/model_engine.py

mikeiovine · 2025-04-11T21:19:11Z

Putting this out to get early feedback and ideas. I'm not really happy with the design right now. Specifically, passing the states between the target and draft models gets pretty complicated when CUDA graphs enter the picture.

hlu1

Mostly nits.

tensorrt_llm/_torch/attention_backend/interface.py

tensorrt_llm/_torch/pyexecutor/model_engine.py

tensorrt_llm/_torch/models/modeling_llama.py

tensorrt_llm/_torch/pyexecutor/model_engine.py

mikeiovine · 2025-04-12T02:12:18Z

Had some discussion offline with @hlu1 about how to make it cleaner:

extra_model_inputs is hard to extend. A single class ModelInput that all models consume would allow us to easily add new features to other models.
At the same time, spec_decode_extra_input_info can be generalized. ~~We can rename it to get_input_shapes, and it can return a dict (str -> (shape, dtype)). It can have some sensible defaults to avoid the burden of having to implement this for more standard models.~~ Decided to just rename it get_warmup_extra_inputs.

I think (2) should definitely be done now. (1) is a pretty big refactor and would probably have to be done in followups.

mikeiovine · 2025-04-14T14:59:23Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-04-14T15:04:54Z

PR_Github #2196 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-14T17:42:33Z

PR_Github #2196 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1586 completed with status: 'FAILURE'

mikeiovine · 2025-04-14T19:00:32Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-04-14T19:05:56Z

PR_Github #2212 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-14T21:26:39Z

PR_Github #2212 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1601 completed with status: 'FAILURE'

tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py

tensorrt_llm/_torch/pyexecutor/model_engine.py

tensorrt_llm/_torch/pyexecutor/py_executor.py

mikeiovine · 2025-04-15T15:27:09Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-04-15T15:32:42Z

PR_Github #2348 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-15T17:41:21Z

PR_Github #2348 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1688 completed with status: 'FAILURE'

mikeiovine · 2025-04-15T17:57:17Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-04-15T18:03:18Z

PR_Github #2357 [ run ] triggered by Bot

mikeiovine · 2025-04-15T19:29:09Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-04-15T19:35:03Z

PR_Github #2363 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-15T19:36:06Z

PR_Github #2357 [ run ] completed with state ABORTED

hlu1

Approve to unblock.

tensorrt_llm/_torch/models/modeling_llama.py

tensorrt_llm/_torch/speculative/eagle3.py

tensorrt-cicd · 2025-04-15T23:06:39Z

PR_Github #2363 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1699 completed with status: 'SUCCESS'

Signed-off-by: Mike Iovine <[email protected]>

mikeiovine · 2025-04-16T16:55:24Z

/bot run --disable-fail-fast

mikeiovine · 2025-04-16T16:55:55Z

Running CI one more time before merging as it has been a while since my last rebase.

tensorrt-cicd · 2025-04-16T17:01:24Z

PR_Github #2497 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-16T20:28:00Z

PR_Github #2497 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1791 completed with status: 'SUCCESS'

mikeiovine · 2025-04-16T20:41:03Z

/bot skip --comment "Pipeline passed before last rebase"

tensorrt-cicd · 2025-04-16T20:46:36Z

PR_Github #2510 [ skip ] triggered by Bot

tensorrt-cicd · 2025-04-16T20:53:49Z

PR_Github #2510 [ skip ] completed with state SUCCESS
Skipping testing for commit efba97d

juney-nvidia changed the title ~~Support CUDA graphs for EAGLE3~~ feat: Support CUDA graphs for EAGLE3 Apr 1, 2025

mikeiovine force-pushed the eagle3-graphs branch 8 times, most recently from 1b6ec07 to 0a61d1c Compare April 11, 2025 20:27

mikeiovine requested review from hlu1, lfr-0531 and QiJune April 11, 2025 21:13

mikeiovine marked this pull request as ready for review April 11, 2025 21:15

mikeiovine commented Apr 11, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/model_engine.py Outdated Show resolved Hide resolved

hlu1 reviewed Apr 11, 2025

View reviewed changes

mikeiovine force-pushed the eagle3-graphs branch from 0a61d1c to a319461 Compare April 14, 2025 14:57

mikeiovine force-pushed the eagle3-graphs branch from a319461 to e95d94d Compare April 14, 2025 15:31

mikeiovine force-pushed the eagle3-graphs branch from e95d94d to da0ba7e Compare April 14, 2025 18:59

hlu1 reviewed Apr 14, 2025

View reviewed changes

mikeiovine force-pushed the eagle3-graphs branch from da0ba7e to 42bec34 Compare April 15, 2025 15:18

mikeiovine force-pushed the eagle3-graphs branch from 42bec34 to 782d4a9 Compare April 15, 2025 17:56

mikeiovine force-pushed the eagle3-graphs branch from 782d4a9 to ae204bc Compare April 15, 2025 19:28

hlu1 approved these changes Apr 15, 2025

View reviewed changes

tensorrt_llm/_torch/models/modeling_llama.py Show resolved Hide resolved

tensorrt_llm/_torch/models/modeling_llama.py Show resolved Hide resolved

tensorrt_llm/_torch/models/modeling_llama.py Show resolved Hide resolved

tensorrt_llm/_torch/speculative/eagle3.py Show resolved Hide resolved

mikeiovine force-pushed the eagle3-graphs branch from ae204bc to 0967026 Compare April 16, 2025 15:30

Support CUDA graphs for EAGLE3

0d87267

Signed-off-by: Mike Iovine <[email protected]>

mikeiovine force-pushed the eagle3-graphs branch from 0967026 to 0d87267 Compare April 16, 2025 16:55

Merge branch 'main' into eagle3-graphs

efba97d

mikeiovine enabled auto-merge (squash) April 16, 2025 20:51

mikeiovine merged commit 41a6c98 into NVIDIA:main Apr 16, 2025
3 checks passed

mikeiovine deleted the eagle3-graphs branch April 16, 2025 20:53

tongyuantongyu mentioned this pull request Apr 25, 2025

chore: Mass integration of release/0.19 into main #3841

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support CUDA graphs for EAGLE3 #3176

feat: Support CUDA graphs for EAGLE3 #3176

mikeiovine commented Mar 31, 2025 •

edited

Loading

mikeiovine commented Apr 11, 2025

hlu1 left a comment

mikeiovine commented Apr 12, 2025 •

edited

Loading

mikeiovine commented Apr 14, 2025

tensorrt-cicd commented Apr 14, 2025

tensorrt-cicd commented Apr 14, 2025

mikeiovine commented Apr 14, 2025

tensorrt-cicd commented Apr 14, 2025

tensorrt-cicd commented Apr 14, 2025

mikeiovine commented Apr 15, 2025

tensorrt-cicd commented Apr 15, 2025

tensorrt-cicd commented Apr 15, 2025

mikeiovine commented Apr 15, 2025

tensorrt-cicd commented Apr 15, 2025

mikeiovine commented Apr 15, 2025

tensorrt-cicd commented Apr 15, 2025

tensorrt-cicd commented Apr 15, 2025

hlu1 left a comment

tensorrt-cicd commented Apr 15, 2025

mikeiovine commented Apr 16, 2025

mikeiovine commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

mikeiovine commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

feat: Support CUDA graphs for EAGLE3 #3176

feat: Support CUDA graphs for EAGLE3 #3176

Conversation

mikeiovine commented Mar 31, 2025 • edited Loading

mikeiovine commented Apr 11, 2025

hlu1 left a comment

Choose a reason for hiding this comment

mikeiovine commented Apr 12, 2025 • edited Loading

mikeiovine commented Apr 14, 2025

tensorrt-cicd commented Apr 14, 2025

tensorrt-cicd commented Apr 14, 2025

mikeiovine commented Apr 14, 2025

tensorrt-cicd commented Apr 14, 2025

tensorrt-cicd commented Apr 14, 2025

mikeiovine commented Apr 15, 2025

tensorrt-cicd commented Apr 15, 2025

tensorrt-cicd commented Apr 15, 2025

mikeiovine commented Apr 15, 2025

tensorrt-cicd commented Apr 15, 2025

mikeiovine commented Apr 15, 2025

tensorrt-cicd commented Apr 15, 2025

tensorrt-cicd commented Apr 15, 2025

hlu1 left a comment

Choose a reason for hiding this comment

tensorrt-cicd commented Apr 15, 2025

mikeiovine commented Apr 16, 2025

mikeiovine commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

mikeiovine commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

mikeiovine commented Mar 31, 2025 •

edited

Loading

mikeiovine commented Apr 12, 2025 •

edited

Loading