Skip to content

feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs #3589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 22, 2025

Conversation

lucaslie
Copy link
Member

Previously, our cudagraph implementation assumed that only the first input is batch-size dependent. This update generalizes the implementation to handle multiple inputs to the cudagraph that are batch-size dependent.

Unit tests and integration tests are also updated for improved coverage


Copilot Summary

This pull request includes multiple changes to improve the functionality of the auto-deploy system and enhance the testing framework. The key changes include modifications to the benchmarking logic, updates to the CompiledGraph class, and enhancements to unit tests for better coverage and performance.

Improvements to Benchmarking Logic:

  • Updated main function in examples/auto_deploy/build_and_run_ad.py to add a condition that prevents benchmarking when the runtime is "demollm" and provides a warning message.

Enhancements to CompiledGraph Class:

  • Added support for multiple batched inputs in CompiledGraph by introducing num_batched_inputs parameter and updating related methods to handle multiple input buffers. [1] [2] [3] [4] [5]
  • Modified TorchOptCompiler to pass num_batched_inputs to CompiledGraph.

Testing Framework Enhancements:

  • Added a new model ModelWithMultipleInputs and updated tests to handle multiple input scenarios, ensuring comprehensive coverage for the new functionality. [1] [2]
  • Skipped a slow unit test in test_deepseek_patches.py to improve test suite performance.

Codebase Simplification:

  • Simplified the _flatten_args function in compiler.py by removing unnecessary assertions and returning a list directly.
  • Updated the initialization logic in compiler.py to handle dynamic shapes more efficiently.

These changes collectively enhance the robustness and performance of the auto-deploy system, while also improving the maintainability and efficiency of the codebase.

@lucaslie lucaslie self-assigned this Apr 16, 2025
@lucaslie
Copy link
Member Author

/bot run

@lucaslie lucaslie changed the title feat: [AutoDeploy generalizing cudagraph to multiple dynamic inputs feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs Apr 16, 2025
@lucaslie lucaslie requested a review from Copilot April 16, 2025 00:51
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

tensorrt_llm/_torch/auto_deploy/compile/compiler.py:21

  • The previous assertion that ensured the main input tensor was at least 2D has been removed. To prevent potential runtime issues with inputs of insufficient dimensions, consider adding an alternative check or a clear error message.
return tree_flatten_spec(all_args, in_spec)

@tensorrt-cicd
Copy link
Collaborator

PR_Github #2377 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #2377 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1711 completed with status: 'FAILURE'

@lucaslie lucaslie force-pushed the ll/cudagraph_batch_sizes branch from f1748fd to 639793d Compare April 16, 2025 16:23
@lucaslie
Copy link
Member Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #2490 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #2490 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1786 completed with status: 'FAILURE'

@Fridah-nv
Copy link
Collaborator

This update is also important to support models like Llama4ForConditionalGeneration that take both text inputs and image inputs. Should we add a similar test model in test_torch_opt.py ?

@lucaslie
Copy link
Member Author

This update is also important to support models like Llama4ForConditionalGeneration that take both text inputs and image inputs. Should we add a similar test model in test_torch_opt.py ?

@Fridah-nv, I have a general test that tests for a variable number of inputs with batch_dim. I think that's sufficient for now. IF there is a different way we need to handle Llama-4 let's do that later

@Fridah-nv
Copy link
Collaborator

This update is also important to support models like Llama4ForConditionalGeneration that take both text inputs and image inputs. Should we add a similar test model in test_torch_opt.py ?

@Fridah-nv, I have a general test that tests for a variable number of inputs with batch_dim. I think that's sufficient for now. IF there is a different way we need to handle Llama-4 let's do that later

I see, thanks for the reply

@Fridah-nv
Copy link
Collaborator

The changes LGTM. @suyoggupta could you approve it since I don't seem to have rights to approve PR

@lucaslie lucaslie force-pushed the ll/cudagraph_batch_sizes branch from 639793d to fb6a4df Compare April 21, 2025 23:55
@lucaslie
Copy link
Member Author

/bot run

@lucaslie lucaslie enabled auto-merge (squash) April 21, 2025 23:56
@tensorrt-cicd
Copy link
Collaborator

PR_Github #2972 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #2972 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #2084 completed with status: 'FAILURE'

@lucaslie lucaslie force-pushed the ll/cudagraph_batch_sizes branch from fb6a4df to a59456d Compare April 22, 2025 14:54
@lucaslie
Copy link
Member Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #3081 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #3081 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #2150 completed with status: 'FAILURE'

@lucaslie
Copy link
Member Author

/bot skip --comment "Unrelated test failure; all other tests pass"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #3091 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #3091 [ skip ] completed with state SUCCESS
Skipping testing for commit a59456d

@lucaslie lucaslie merged commit 06b914e into NVIDIA:main Apr 22, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants