Clean up modeling_deepseek.py #3640

hlu1 · 2025-04-17T00:18:10Z

Rename some variables to make the code more readable
Remove unnecessary if/else branches
Simplify some of the logics in allreduce fusion

The modeling_deepseek.py files looks mostly clean after this round of cleaning. I kept the allreduce part the same because we need the allreduce op unification PR to land first.

For accuracy test, I ran the examples/pytorch/quickstart_advanced.py script with the R1 model with different configs.

hlu1 · 2025-04-17T00:23:01Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-04-17T00:28:35Z

PR_Github #2533 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-17T02:43:42Z

PR_Github #2533 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1815 completed with status: 'FAILURE'

hlu1 · 2025-04-17T06:42:57Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-04-17T06:48:31Z

PR_Github #2596 [ run ] triggered by Bot

hlu1 · 2025-04-17T06:48:45Z

/bot kill

tensorrt-cicd · 2025-04-17T06:54:13Z

PR_Github #2601 [ kill ] triggered by Bot

tensorrt-cicd · 2025-04-17T06:55:02Z

PR_Github #2596 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-04-17T06:55:32Z

PR_Github #2601 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit e192b6e

hlu1 · 2025-04-17T07:12:35Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-04-17T07:18:24Z

PR_Github #2605 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-17T09:46:33Z

PR_Github #2605 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1857 completed with status: 'FAILURE'

hlu1 · 2025-04-17T18:42:39Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-04-17T18:48:14Z

PR_Github #2672 [ run ] triggered by Bot

tensorrt_llm/_torch/models/modeling_deepseekv3.py

tensorrt-cicd · 2025-04-17T22:31:14Z

PR_Github #2672 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1903 completed with status: 'SUCCESS'

QiJune

LGTM

Signed-off-by: Hao Lu <[email protected]>

hlu1 force-pushed the deepseek_cleanup branch from 1ec9790 to 57f9a2e Compare April 17, 2025 00:18

hlu1 requested review from QiJune, hyukn, SimengLiu-nv and HuiGao-NV April 17, 2025 00:20

hlu1 force-pushed the deepseek_cleanup branch from 57f9a2e to 892d4ff Compare April 17, 2025 06:42

hlu1 force-pushed the deepseek_cleanup branch from 892d4ff to e192b6e Compare April 17, 2025 06:48

hlu1 force-pushed the deepseek_cleanup branch from e192b6e to 693a21b Compare April 17, 2025 07:12

hlu1 force-pushed the deepseek_cleanup branch from 693a21b to 593cb9f Compare April 17, 2025 18:42

QiJune reviewed Apr 17, 2025

View reviewed changes

tensorrt_llm/_torch/models/modeling_deepseekv3.py Show resolved Hide resolved

SimengLiu-nv approved these changes Apr 18, 2025

View reviewed changes

QiJune approved these changes Apr 18, 2025

View reviewed changes

Clean up modeling_deepseek.py

e4b933b

Signed-off-by: Hao Lu <[email protected]>

hlu1 force-pushed the deepseek_cleanup branch from 593cb9f to e4b933b Compare April 18, 2025 23:44

hlu1 merged commit c861b6c into NVIDIA:main Apr 19, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up modeling_deepseek.py #3640

Clean up modeling_deepseek.py #3640

hlu1 commented Apr 17, 2025 •

edited

Loading

hlu1 commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

hlu1 commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

hlu1 commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

hlu1 commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

hlu1 commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

QiJune left a comment

Clean up modeling_deepseek.py #3640

Clean up modeling_deepseek.py #3640

Conversation

hlu1 commented Apr 17, 2025 • edited Loading

hlu1 commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

hlu1 commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

hlu1 commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

hlu1 commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

hlu1 commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

tensorrt-cicd commented Apr 17, 2025

QiJune left a comment

Choose a reason for hiding this comment

hlu1 commented Apr 17, 2025 •

edited

Loading