Skip to content

chore: Unify Python NVTX call #3450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 15, 2025
Merged

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Apr 10, 2025

  • Move and unify NVTX implementation to tensorrt_llm/_utils.py
  • Replace the original LLM API NVTX with nvtx_range_debug call

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

@kaiyux
Copy link
Member Author

kaiyux commented Apr 10, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #1775 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #1775 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1316 completed with status: 'FAILURE'

@Superjomn
Copy link
Collaborator

Superjomn commented Apr 11, 2025

We mentioned having nvtx levels in an earlier discussion for tradeoff between details and performance with nsys, I noticed this MR introduces two levels, nvtx_range and nvtx_range_debug, are these two levels sufficient? Or will we introduce more or per-module switching in the future? @kaiyux @FrankD412

And I noticed this PR move nvtx_range to ._utils.py, will you unify the torch.cuda.nvtx_range usages in other places like pyexecutor? @kaiyux

@kaiyux
Copy link
Member Author

kaiyux commented Apr 11, 2025

@Superjomn Thanks for the comment

are these two levels sufficient? Or will we introduce more or per-module switching in the future?

I can't think of a scenario that we will need more levels - we already have layerwise nvtx markers as a feature.

And I noticed this PR move nvtx_range to ._utils.py, will you unify the torch.cuda.nvtx_range usages in other places like pyexecutor?

It should already be unified in this PR.

@Superjomn
Copy link
Collaborator

@Superjomn Thanks for the comment

are these two levels sufficient? Or will we introduce more or per-module switching in the future?

I can't think of a scenario that we will need more levels - we already have layerwise nvtx markers as a feature.

And I noticed this PR move nvtx_range to ._utils.py, will you unify the torch.cuda.nvtx_range usages in other places like pyexecutor?

It should already be unified in this PR.

Got it; that is clear. Thanks for your reply.

@kaiyux
Copy link
Member Author

kaiyux commented Apr 11, 2025

/bot run

@kaiyux kaiyux force-pushed the user/kaiyu/unify_nvtx branch from eaa4ea3 to 955ddf7 Compare April 11, 2025 02:55
@kaiyux
Copy link
Member Author

kaiyux commented Apr 11, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #1851 [ run ] triggered by Bot

@FrankD412
Copy link
Collaborator

We mentioned having nvtx levels in an earlier discussion for tradeoff between details and performance with nsys, I noticed this MR introduces two levels, nvtx_range and nvtx_range_debug, are these two levels sufficient? Or will we introduce more or per-module switching in the future? @kaiyux @FrankD412

And I noticed this PR move nvtx_range to ._utils.py, will you unify the torch.cuda.nvtx_range usages in other places like pyexecutor? @kaiyux

I think two levels is fine -- when we profile we want as much information as possible so as long as it's documented how to do that I think we're happy on our end.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #1851 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1370 completed with status: 'SUCCESS'

Copy link
Collaborator

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

kaiyux added 5 commits April 15, 2025 23:12
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
@kaiyux kaiyux force-pushed the user/kaiyu/unify_nvtx branch from 9eb5c50 to 6379c3e Compare April 15, 2025 15:12
@kaiyux
Copy link
Member Author

kaiyux commented Apr 15, 2025

/bot reuse-pipeline

@kaiyux kaiyux enabled auto-merge (squash) April 15, 2025 15:13
@tensorrt-cicd
Copy link
Collaborator

PR_Github #2347 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #2347 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #1851 for commit 6379c3e

@kaiyux kaiyux merged commit e037d3e into NVIDIA:main Apr 15, 2025
3 checks passed
vegaluisjose pushed a commit to vegaluisjose/TensorRT-LLM that referenced this pull request Apr 15, 2025
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Luis Vega <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants