feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs #3589

lucaslie · 2025-04-16T00:50:29Z

Previously, our cudagraph implementation assumed that only the first input is batch-size dependent. This update generalizes the implementation to handle multiple inputs to the cudagraph that are batch-size dependent.

Unit tests and integration tests are also updated for improved coverage

Copilot Summary

This pull request includes multiple changes to improve the functionality of the auto-deploy system and enhance the testing framework. The key changes include modifications to the benchmarking logic, updates to the CompiledGraph class, and enhancements to unit tests for better coverage and performance.

Improvements to Benchmarking Logic:

Updated main function in examples/auto_deploy/build_and_run_ad.py to add a condition that prevents benchmarking when the runtime is "demollm" and provides a warning message.

Enhancements to `CompiledGraph` Class:

Added support for multiple batched inputs in CompiledGraph by introducing num_batched_inputs parameter and updating related methods to handle multiple input buffers. [1] [2] [3] [4] [5]
Modified TorchOptCompiler to pass num_batched_inputs to CompiledGraph.

Testing Framework Enhancements:

Added a new model ModelWithMultipleInputs and updated tests to handle multiple input scenarios, ensuring comprehensive coverage for the new functionality. [1] [2]
Skipped a slow unit test in test_deepseek_patches.py to improve test suite performance.

Codebase Simplification:

Simplified the _flatten_args function in compiler.py by removing unnecessary assertions and returning a list directly.
Updated the initialization logic in compiler.py to handle dynamic shapes more efficiently.

These changes collectively enhance the robustness and performance of the auto-deploy system, while also improving the maintainability and efficiency of the codebase.

lucaslie · 2025-04-16T00:51:03Z

/bot run

Copilot

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

tensorrt_llm/_torch/auto_deploy/compile/compiler.py:21

The previous assertion that ensured the main input tensor was at least 2D has been removed. To prevent potential runtime issues with inputs of insufficient dimensions, consider adding an alternative check or a clear error message.

return tree_flatten_spec(all_args, in_spec)

tensorrt-cicd · 2025-04-16T00:56:57Z

PR_Github #2377 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-16T02:29:56Z

PR_Github #2377 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1711 completed with status: 'FAILURE'

lucaslie · 2025-04-16T16:25:28Z

/bot run

tensorrt-cicd · 2025-04-16T16:31:08Z

PR_Github #2490 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-16T18:32:29Z

PR_Github #2490 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1786 completed with status: 'FAILURE'

Fridah-nv · 2025-04-16T19:22:01Z

This update is also important to support models like Llama4ForConditionalGeneration that take both text inputs and image inputs. Should we add a similar test model in test_torch_opt.py ?

tensorrt_llm/_torch/auto_deploy/compile/backends/torch_opt.py

lucaslie · 2025-04-17T20:49:09Z

This update is also important to support models like Llama4ForConditionalGeneration that take both text inputs and image inputs. Should we add a similar test model in test_torch_opt.py ?

@Fridah-nv, I have a general test that tests for a variable number of inputs with batch_dim. I think that's sufficient for now. IF there is a different way we need to handle Llama-4 let's do that later

Fridah-nv · 2025-04-17T20:57:45Z

This update is also important to support models like Llama4ForConditionalGeneration that take both text inputs and image inputs. Should we add a similar test model in test_torch_opt.py ?

@Fridah-nv, I have a general test that tests for a variable number of inputs with batch_dim. I think that's sufficient for now. IF there is a different way we need to handle Llama-4 let's do that later

I see, thanks for the reply

Fridah-nv · 2025-04-17T21:20:30Z

The changes LGTM. @suyoggupta could you approve it since I don't seem to have rights to approve PR

lucaslie · 2025-04-21T23:56:07Z

/bot run

tensorrt-cicd · 2025-04-22T00:03:39Z

PR_Github #2972 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-22T06:59:07Z

PR_Github #2972 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #2084 completed with status: 'FAILURE'

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie · 2025-04-22T14:54:41Z

/bot run

tensorrt-cicd · 2025-04-22T15:00:45Z

PR_Github #3081 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-22T18:00:00Z

PR_Github #3081 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #2150 completed with status: 'FAILURE'

lucaslie · 2025-04-22T19:27:05Z

/bot skip --comment "Unrelated test failure; all other tests pass"

tensorrt-cicd · 2025-04-22T19:32:46Z

PR_Github #3091 [ skip ] triggered by Bot

tensorrt-cicd · 2025-04-22T19:38:48Z

PR_Github #3091 [ skip ] completed with state SUCCESS
Skipping testing for commit a59456d

lucaslie requested review from suyoggupta and Fridah-nv April 16, 2025 00:50

lucaslie self-assigned this Apr 16, 2025

lucaslie changed the title ~~feat: [AutoDeploy generalizing cudagraph to multiple dynamic inputs~~ feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs Apr 16, 2025

lucaslie requested a review from Copilot April 16, 2025 00:51

Copilot AI reviewed Apr 16, 2025

View reviewed changes

lucaslie force-pushed the ll/cudagraph_batch_sizes branch from f1748fd to 639793d Compare April 16, 2025 16:23

Fridah-nv reviewed Apr 16, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/compile/backends/torch_opt.py Show resolved Hide resolved

suyoggupta approved these changes Apr 18, 2025

View reviewed changes

lucaslie force-pushed the ll/cudagraph_batch_sizes branch from 639793d to fb6a4df Compare April 21, 2025 23:55

lucaslie enabled auto-merge (squash) April 21, 2025 23:56

lucaslie added 2 commits April 22, 2025 07:53

generalizing cudagraph to multiple dynamic inputs

67fda3b

Signed-off-by: Lucas Liebenwein <[email protected]>

fix for failing test

a59456d

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie force-pushed the ll/cudagraph_batch_sizes branch from fb6a4df to a59456d Compare April 22, 2025 14:54

lucaslie merged commit 06b914e into NVIDIA:main Apr 22, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs #3589

feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs #3589

lucaslie commented Apr 16, 2025

lucaslie commented Apr 16, 2025

Copilot AI left a comment

tensorrt-cicd commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

lucaslie commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

Fridah-nv commented Apr 16, 2025

lucaslie commented Apr 17, 2025

Fridah-nv commented Apr 17, 2025

Fridah-nv commented Apr 17, 2025

lucaslie commented Apr 21, 2025

tensorrt-cicd commented Apr 22, 2025

tensorrt-cicd commented Apr 22, 2025

lucaslie commented Apr 22, 2025

tensorrt-cicd commented Apr 22, 2025

tensorrt-cicd commented Apr 22, 2025

lucaslie commented Apr 22, 2025

tensorrt-cicd commented Apr 22, 2025

tensorrt-cicd commented Apr 22, 2025

feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs #3589

feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs #3589

Conversation

lucaslie commented Apr 16, 2025

Copilot Summary

Improvements to Benchmarking Logic:

Enhancements to CompiledGraph Class:

Testing Framework Enhancements:

Codebase Simplification:

lucaslie commented Apr 16, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

tensorrt-cicd commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

lucaslie commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

tensorrt-cicd commented Apr 16, 2025

Fridah-nv commented Apr 16, 2025

lucaslie commented Apr 17, 2025

Fridah-nv commented Apr 17, 2025

Fridah-nv commented Apr 17, 2025

lucaslie commented Apr 21, 2025

tensorrt-cicd commented Apr 22, 2025

tensorrt-cicd commented Apr 22, 2025

lucaslie commented Apr 22, 2025

tensorrt-cicd commented Apr 22, 2025

tensorrt-cicd commented Apr 22, 2025

lucaslie commented Apr 22, 2025

tensorrt-cicd commented Apr 22, 2025

tensorrt-cicd commented Apr 22, 2025

Enhancements to `CompiledGraph` Class: