-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs #3589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs #3589
Conversation
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
tensorrt_llm/_torch/auto_deploy/compile/compiler.py:21
- The previous assertion that ensured the main input tensor was at least 2D has been removed. To prevent potential runtime issues with inputs of insufficient dimensions, consider adding an alternative check or a clear error message.
return tree_flatten_spec(all_args, in_spec)
PR_Github #2377 [ run ] triggered by Bot |
PR_Github #2377 [ run ] completed with state |
f1748fd
to
639793d
Compare
/bot run |
PR_Github #2490 [ run ] triggered by Bot |
PR_Github #2490 [ run ] completed with state |
This update is also important to support models like |
@Fridah-nv, I have a general test that tests for a variable number of inputs with batch_dim. I think that's sufficient for now. IF there is a different way we need to handle Llama-4 let's do that later |
I see, thanks for the reply |
The changes LGTM. @suyoggupta could you approve it since I don't seem to have rights to approve PR |
639793d
to
fb6a4df
Compare
/bot run |
PR_Github #2972 [ run ] triggered by Bot |
PR_Github #2972 [ run ] completed with state |
Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
fb6a4df
to
a59456d
Compare
/bot run |
PR_Github #3081 [ run ] triggered by Bot |
PR_Github #3081 [ run ] completed with state |
/bot skip --comment "Unrelated test failure; all other tests pass" |
PR_Github #3091 [ skip ] triggered by Bot |
PR_Github #3091 [ skip ] completed with state |
Previously, our cudagraph implementation assumed that only the first input is batch-size dependent. This update generalizes the implementation to handle multiple inputs to the cudagraph that are batch-size dependent.
Unit tests and integration tests are also updated for improved coverage
Copilot Summary
This pull request includes multiple changes to improve the functionality of the auto-deploy system and enhance the testing framework. The key changes include modifications to the benchmarking logic, updates to the
CompiledGraph
class, and enhancements to unit tests for better coverage and performance.Improvements to Benchmarking Logic:
main
function inexamples/auto_deploy/build_and_run_ad.py
to add a condition that prevents benchmarking when the runtime is "demollm" and provides a warning message.Enhancements to
CompiledGraph
Class:CompiledGraph
by introducingnum_batched_inputs
parameter and updating related methods to handle multiple input buffers. [1] [2] [3] [4] [5]TorchOptCompiler
to passnum_batched_inputs
toCompiledGraph
.Testing Framework Enhancements:
ModelWithMultipleInputs
and updated tests to handle multiple input scenarios, ensuring comprehensive coverage for the new functionality. [1] [2]test_deepseek_patches.py
to improve test suite performance.Codebase Simplification:
_flatten_args
function incompiler.py
by removing unnecessary assertions and returning a list directly.compiler.py
to handle dynamic shapes more efficiently.These changes collectively enhance the robustness and performance of the auto-deploy system, while also improving the maintainability and efficiency of the codebase.