feat: Add FP8 support for SM 120 #3248

pamelap-nvidia · 2025-04-02T21:05:34Z

Added fp8 support for sm120
Avoided sm120 for fp4 in a few places
Cubins are already updated from previous MRs.

pamelap-nvidia · 2025-04-02T21:05:50Z

/bot run

pamelap-nvidia · 2025-04-03T14:35:34Z

/bot run

schetlur-nv · 2025-04-03T20:49:10Z

/bot run

tensorrt-cicd · 2025-04-03T20:54:35Z

PR_Github #1143 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-03T21:04:44Z

PR_Github #1143 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #860 completed with status: 'FAILURE'

yibinl-nvidia · 2025-04-03T22:07:21Z

/bot run

tensorrt-cicd · 2025-04-03T22:13:10Z

PR_Github #1144 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-04T00:26:31Z

PR_Github #1144 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #861 completed with status: 'FAILURE'

yibinl-nvidia · 2025-04-04T02:03:28Z

/bot run

tensorrt-cicd · 2025-04-04T02:08:53Z

PR_Github #1152 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-04T06:08:04Z

PR_Github #1152 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #867 completed with status: 'SUCCESS'

pamelap-nvidia · 2025-04-10T02:46:38Z

/bot run

tensorrt-cicd · 2025-04-10T02:51:47Z

PR_Github #1681 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-10T09:38:16Z

PR_Github #1681 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1257 completed with status: 'FAILURE'

pamelap-nvidia · 2025-04-10T15:27:10Z

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-1, DGX_H100-4_GPUs-PyTorch-2, DGX_H100-4_GPUs-TensorRT-2"

tensorrt-cicd · 2025-04-10T15:32:38Z

PR_Github #1803 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-11T01:59:50Z

PR_Github #1803 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #1336 (Partly Tested) completed with status: 'FAILURE'

chzblych · 2025-04-11T09:53:12Z

@pamelap-nvidia FYI - @EmmaQiaoCh is also adding the automation testing for gb20x.

Signed-off-by: Pamela Peng <[email protected]>

pamelap-nvidia · 2025-04-11T15:10:42Z

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-1, DGX_H100-4_GPUs-PyTorch-2, DGX_H100-4_GPUs-TensorRT-2"

tensorrt-cicd · 2025-04-11T15:16:19Z

PR_Github #1951 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-12T02:56:27Z

PR_Github #1951 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1436 (Partly Tested) completed with status: 'SUCCESS'

EmmaQiaoCh · 2025-04-13T13:39:32Z

Hi Pamela, will you also add the test list yml file for L0 under 'tests/integration/test_lists/test-db'? Or you can tell me the tests that you want to run on SM120, I also need to change some CI scripts:)
Thanks~

pamelap-nvidia · 2025-04-14T17:03:54Z

Hi Pamela, will you also add the test list yml file for L0 under 'tests/integration/test_lists/test-db'? Or you can tell me the tests that you want to run on SM120, I also need to change some CI scripts:) Thanks~

Resolved offline.

cpp/micro_benchmarks/mixtureOfExpertsBackendBenchmarkFixture.h

cpp/tensorrt_llm/common/attentionOp.cpp

brb-nv · 2025-04-14T18:05:31Z

Minor comments. Changes look good to me.

Signed-off-by: Pamela Peng <[email protected]>

schetlur-nv · 2025-04-14T23:06:25Z

Bypassing some checks to merge since comments addressed after PR_Github #1951 were very minor.

* Allow FP8 on SM120 Signed-off-by: Pamela Peng <[email protected]> * fix sm121 Signed-off-by: Pamela Peng <[email protected]> * fix Signed-off-by: Pamela Peng <[email protected]> * fix pre-commit Signed-off-by: Pamela Peng <[email protected]> * review update Signed-off-by: Pamela Peng <[email protected]> --------- Signed-off-by: Pamela Peng <[email protected]> Co-authored-by: Sharan Chetlur <[email protected]> Signed-off-by: Luis Vega <[email protected]>

juney-nvidia changed the title ~~Add FP8 support for SM 120~~ feat: Add FP8 support for SM 120 Apr 3, 2025

pamelap-nvidia force-pushed the gb20x_fp8 branch 2 times, most recently from edb80f7 to 97e6fb3 Compare April 3, 2025 14:35

pamelap-nvidia force-pushed the gb20x_fp8 branch from 2bd610c to 4006bfc Compare April 3, 2025 21:39

pamelap-nvidia self-assigned this Apr 9, 2025

pamelap-nvidia requested a review from schetlur-nv April 9, 2025 14:58

pamelap-nvidia force-pushed the gb20x_fp8 branch from 4006bfc to 7676af8 Compare April 10, 2025 02:46

pamelap-nvidia requested a review from chzblych April 10, 2025 02:52

chzblych requested a review from EmmaQiaoCh April 11, 2025 09:53

pamelap-nvidia added 3 commits April 11, 2025 11:10

Allow FP8 on SM120

6cad7a4

Signed-off-by: Pamela Peng <[email protected]>

fix sm121

fdacb9a

Signed-off-by: Pamela Peng <[email protected]>

fix

20784c6

Signed-off-by: Pamela Peng <[email protected]>

fix pre-commit

452d770

Signed-off-by: Pamela Peng <[email protected]>

pamelap-nvidia force-pushed the gb20x_fp8 branch from 7676af8 to 452d770 Compare April 11, 2025 15:10

EmmaQiaoCh mentioned this pull request Apr 14, 2025

infra: Add test stages for sm120 #3533

Merged

schetlur-nv requested a review from brb-nv April 14, 2025 17:36

brb-nv reviewed Apr 14, 2025

View reviewed changes

cpp/micro_benchmarks/mixtureOfExpertsBackendBenchmarkFixture.h Show resolved Hide resolved

brb-nv reviewed Apr 14, 2025

View reviewed changes

cpp/tensorrt_llm/common/attentionOp.cpp Outdated Show resolved Hide resolved

brb-nv approved these changes Apr 14, 2025

View reviewed changes

pamelap-nvidia and others added 2 commits April 14, 2025 18:09

review update

2d868db

Signed-off-by: Pamela Peng <[email protected]>

Merge branch 'main' into gb20x_fp8

c374e9d

schetlur-nv approved these changes Apr 14, 2025

View reviewed changes

schetlur-nv merged commit 6cdfc54 into NVIDIA:main Apr 14, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add FP8 support for SM 120 #3248

feat: Add FP8 support for SM 120 #3248

pamelap-nvidia commented Apr 2, 2025

pamelap-nvidia commented Apr 2, 2025

pamelap-nvidia commented Apr 3, 2025

schetlur-nv commented Apr 3, 2025

tensorrt-cicd commented Apr 3, 2025

tensorrt-cicd commented Apr 3, 2025

yibinl-nvidia commented Apr 3, 2025

tensorrt-cicd commented Apr 3, 2025

tensorrt-cicd commented Apr 4, 2025

yibinl-nvidia commented Apr 4, 2025

tensorrt-cicd commented Apr 4, 2025

tensorrt-cicd commented Apr 4, 2025

pamelap-nvidia commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

pamelap-nvidia commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

tensorrt-cicd commented Apr 11, 2025

chzblych commented Apr 11, 2025

pamelap-nvidia commented Apr 11, 2025

tensorrt-cicd commented Apr 11, 2025

tensorrt-cicd commented Apr 12, 2025

EmmaQiaoCh commented Apr 13, 2025

pamelap-nvidia commented Apr 14, 2025

brb-nv commented Apr 14, 2025

schetlur-nv commented Apr 14, 2025

feat: Add FP8 support for SM 120 #3248

feat: Add FP8 support for SM 120 #3248

Conversation

pamelap-nvidia commented Apr 2, 2025

pamelap-nvidia commented Apr 2, 2025

pamelap-nvidia commented Apr 3, 2025

schetlur-nv commented Apr 3, 2025

tensorrt-cicd commented Apr 3, 2025

tensorrt-cicd commented Apr 3, 2025

yibinl-nvidia commented Apr 3, 2025

tensorrt-cicd commented Apr 3, 2025

tensorrt-cicd commented Apr 4, 2025

yibinl-nvidia commented Apr 4, 2025

tensorrt-cicd commented Apr 4, 2025

tensorrt-cicd commented Apr 4, 2025

pamelap-nvidia commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

pamelap-nvidia commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

tensorrt-cicd commented Apr 11, 2025

chzblych commented Apr 11, 2025

pamelap-nvidia commented Apr 11, 2025

tensorrt-cicd commented Apr 11, 2025

tensorrt-cicd commented Apr 12, 2025

EmmaQiaoCh commented Apr 13, 2025

pamelap-nvidia commented Apr 14, 2025

brb-nv commented Apr 14, 2025

schetlur-nv commented Apr 14, 2025