feat: Support gemma-3-1b-it #3247

brb-nv · 2025-04-02T18:30:33Z

This MR adds model support for gemma-3-1b-it.

Details:

This model has attn layers of two types - one with sliding-window attn with 10k rope base and another with regular attn with 1 million rope base.
Most models in TRTLLM seem to be working with attention layers with attn layers of same type. Hence, there's only one set of RoPE params under AttentionParams that is passed to model's forward pass.
However, in this case, we need two sets of RoPE params (one for each layer type mentioned above).
Changes in this MR are adding one more set of RoPE fields under AttentionParams with a _local suffix to handle the additional layer type.
Alternative is to copy quite some functionality from modeling_utils.py (DecoderLayerList and DecoderModelForCausalLM) and carefully orchestrate forward pass by passing a different set of AttentionParams to each layer type. I felt that's quite some code duplication and maintenance overhead.
Sliding window requirement is mentioned to the model at runtime using max_attention_window_size with run.py or whatever API is being used.
[512, 512, 512, 512, 512, 2048, 512, 512, 512, 512, 512, 2048, 512, 512, 512, 512, 512, 2048, 512, 512, 512, 512, 512, 2048, 512, 512]

$ rm -rf gemma3_1b_ckpt/ && python3 ./examples/gemma/convert_checkpoint.py --ckpt-type hf --model-dir ../random/hf_models/gemma-3-1b-it/ --dtype float16 --output-model-dir gemma3_1b_ckpt/
$ rm -rf gemma3_1b_eng/ && trtllm-build --checkpoint_dir gemma3_1b_ckpt/ --output_dir gemma3_1b_eng/trtllm_engine --max_batch_size 256 --max_seq_len 32768 --max_num_tokens 32768 --workers 1 --use_paged_context_fmha disable --gpus_per_node 1 --nccl_plugin auto
$ rm -rf tllm_debug/ && python3 examples/run.py --max_output_len 512 --max_input_length 2048 --input_text 'The main cities in Italy are (Write a blog post)' --engine_dir gemma3_1b_eng/trtllm_engine --tokenizer_dir ../random/hf_models/gemma-3-1b-it/ --end_id 106

brb-nv · 2025-04-04T15:49:36Z

/bot run

tensorrt-cicd · 2025-04-04T15:55:06Z

PR_Github #1179 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-04T16:09:27Z

PR_Github #1179 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #885 completed with status: 'FAILURE'

brb-nv · 2025-04-09T04:27:09Z

/bot run

tensorrt-cicd · 2025-04-09T04:32:23Z

PR_Github #1546 [ run ] triggered by Bot

examples/utils.py

tensorrt_llm/layers/attention.py

tensorrt_llm/models/gemma/config.py

tensorrt_llm/layers/attention.py

tensorrt_llm/models/modeling_utils.py

tensorrt_llm/quantization/quantize_by_modelopt.py

tensorrt-cicd · 2025-04-09T05:36:11Z

PR_Github #1546 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1155 completed with status: 'FAILURE'

bebilli · 2025-04-09T11:24:50Z

Does it support Gemma-3-27B?

amukkara · 2025-04-09T17:40:42Z

@brb-nv can you update examples/gemma/README.md with gemma3 instructions. specifically, clearly show instructions to set up the correct attention window values.

ideally, the attention window values should be part of the model definition so that users do not have to provide these values. we can add that in a follow-up PR.

brb-nv · 2025-04-09T17:56:06Z

Does it support Gemma-3-27B?

Hi, goal is to add support for text generation model first. We'll get to the multimodal models in a follow-up MR.

brb-nv · 2025-04-09T19:08:07Z

@brb-nv can you update examples/gemma/README.md with gemma3 instructions. specifically, clearly show instructions to set up the correct attention window values.

ideally, the attention window values should be part of the model definition so that users do not have to provide these values. we can add that in a follow-up PR.

Updated here, Anurag. 1ad80ea

brb-nv · 2025-04-09T21:03:00Z

/bot run

tensorrt-cicd · 2025-04-09T21:08:37Z

PR_Github #1646 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-09T22:53:33Z

PR_Github #1646 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1231 completed with status: 'FAILURE'

brb-nv · 2025-04-10T01:30:15Z

/bot run

tensorrt-cicd · 2025-04-10T01:35:50Z

PR_Github #1670 [ run ] triggered by Bot

tensorrt-cicd · 2025-04-10T04:19:37Z

PR_Github #1670 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #1248 completed with status: 'SUCCESS'

Signed-off-by: Balaram Buddharaju <[email protected]>

chzblych · 2025-04-10T04:23:20Z

/bot reuse-pipeline

chzblych

Approving this MR for merging.

tensorrt-cicd · 2025-04-10T04:28:31Z

PR_Github #1699 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd · 2025-04-10T04:34:57Z

PR_Github #1699 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #1670 for commit decb665

Signed-off-by: Balaram Buddharaju <[email protected]>

brb-nv marked this pull request as draft April 2, 2025 18:30

brb-nv changed the title ~~draft: Gemma~~ feat: Gemma Apr 4, 2025

InCogNiTo124 mentioned this pull request Apr 7, 2025

When will Gemma 3 be supported? #3143

Open

brb-nv force-pushed the user/brb/gemma3-on-main branch 4 times, most recently from 1e31450 to 6c4f0df Compare April 9, 2025 04:01

brb-nv changed the title ~~feat: Gemma~~ feat: Support gemma-3-1b-it Apr 9, 2025

brb-nv marked this pull request as ready for review April 9, 2025 04:27

brb-nv requested a review from amukkara April 9, 2025 04:30

brb-nv requested a review from schetlur-nv April 9, 2025 04:33

amukkara reviewed Apr 9, 2025

View reviewed changes

brb-nv self-assigned this Apr 9, 2025

brb-nv force-pushed the user/brb/gemma3-on-main branch from ad0e30e to 25e1494 Compare April 9, 2025 17:57

amukkara approved these changes Apr 9, 2025

View reviewed changes

brb-nv force-pushed the user/brb/gemma3-on-main branch 3 times, most recently from 2e9df5a to 1ad80ea Compare April 9, 2025 19:06

brb-nv force-pushed the user/brb/gemma3-on-main branch from 0be1885 to f34b971 Compare April 9, 2025 21:02

brb-nv force-pushed the user/brb/gemma3-on-main branch 2 times, most recently from 1d1fadd to 7ebad03 Compare April 10, 2025 01:27

feat: Add Gemma3 text-only model support

decb665

Signed-off-by: Balaram Buddharaju <[email protected]>

chzblych force-pushed the user/brb/gemma3-on-main branch from 7ebad03 to decb665 Compare April 10, 2025 04:23

chzblych enabled auto-merge (squash) April 10, 2025 04:23

chzblych approved these changes Apr 10, 2025

View reviewed changes

chzblych merged commit c59abae into NVIDIA:main Apr 10, 2025
2 checks passed

Superjomn pushed a commit to Superjomn/TensorRT-LLM that referenced this pull request Apr 11, 2025

feat: Add Gemma3 text-only model support (NVIDIA#3247)

30c5bce

Signed-off-by: Balaram Buddharaju <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support gemma-3-1b-it #3247

feat: Support gemma-3-1b-it #3247

brb-nv commented Apr 2, 2025 •

edited

Loading

brb-nv commented Apr 4, 2025

tensorrt-cicd commented Apr 4, 2025

tensorrt-cicd commented Apr 4, 2025

brb-nv commented Apr 9, 2025

tensorrt-cicd commented Apr 9, 2025

tensorrt-cicd commented Apr 9, 2025

bebilli commented Apr 9, 2025

amukkara commented Apr 9, 2025 •

edited

Loading

brb-nv commented Apr 9, 2025 •

edited

Loading

brb-nv commented Apr 9, 2025 •

edited

Loading

brb-nv commented Apr 9, 2025

tensorrt-cicd commented Apr 9, 2025

tensorrt-cicd commented Apr 9, 2025

brb-nv commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

chzblych commented Apr 10, 2025

chzblych left a comment

tensorrt-cicd commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

feat: Support gemma-3-1b-it #3247

feat: Support gemma-3-1b-it #3247

Conversation

brb-nv commented Apr 2, 2025 • edited Loading

brb-nv commented Apr 4, 2025

tensorrt-cicd commented Apr 4, 2025

tensorrt-cicd commented Apr 4, 2025

brb-nv commented Apr 9, 2025

tensorrt-cicd commented Apr 9, 2025

tensorrt-cicd commented Apr 9, 2025

bebilli commented Apr 9, 2025

amukkara commented Apr 9, 2025 • edited Loading

brb-nv commented Apr 9, 2025 • edited Loading

brb-nv commented Apr 9, 2025 • edited Loading

brb-nv commented Apr 9, 2025

tensorrt-cicd commented Apr 9, 2025

tensorrt-cicd commented Apr 9, 2025

brb-nv commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

chzblych commented Apr 10, 2025

chzblych left a comment

Choose a reason for hiding this comment

tensorrt-cicd commented Apr 10, 2025

tensorrt-cicd commented Apr 10, 2025

brb-nv commented Apr 2, 2025 •

edited

Loading

amukkara commented Apr 9, 2025 •

edited

Loading

brb-nv commented Apr 9, 2025 •

edited

Loading

brb-nv commented Apr 9, 2025 •

edited

Loading