-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: Support gemma-3-1b-it #3247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/bot run |
PR_Github #1179 [ run ] triggered by Bot |
PR_Github #1179 [ run ] completed with state |
1e31450
to
6c4f0df
Compare
/bot run |
PR_Github #1546 [ run ] triggered by Bot |
PR_Github #1546 [ run ] completed with state |
Does it support Gemma-3-27B? |
@brb-nv can you update ideally, the attention window values should be part of the model definition so that users do not have to provide these values. we can add that in a follow-up PR. |
Hi, goal is to add support for text generation model first. We'll get to the multimodal models in a follow-up MR. |
ad0e30e
to
25e1494
Compare
2e9df5a
to
1ad80ea
Compare
Updated here, Anurag. 1ad80ea |
0be1885
to
f34b971
Compare
/bot run |
PR_Github #1646 [ run ] triggered by Bot |
PR_Github #1646 [ run ] completed with state |
1d1fadd
to
7ebad03
Compare
/bot run |
PR_Github #1670 [ run ] triggered by Bot |
PR_Github #1670 [ run ] completed with state |
Signed-off-by: Balaram Buddharaju <[email protected]>
7ebad03
to
decb665
Compare
/bot reuse-pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this MR for merging.
PR_Github #1699 [ reuse-pipeline ] triggered by Bot |
PR_Github #1699 [ reuse-pipeline ] completed with state |
Signed-off-by: Balaram Buddharaju <[email protected]>
This MR adds model support for
gemma-3-1b-it
.Details:
AttentionParams
that is passed to model's forward pass.AttentionParams
with a_local
suffix to handle the additional layer type.modeling_utils.py
(DecoderLayerList
andDecoderModelForCausalLM
) and carefully orchestrate forward pass by passing a different set ofAttentionParams
to each layer type. I felt that's quite some code duplication and maintenance overhead.max_attention_window_size
withrun.py
or whatever API is being used.[512, 512, 512, 512, 512, 2048, 512, 512, 512, 512, 512, 2048, 512, 512, 512, 512, 512, 2048, 512, 512, 512, 512, 512, 2048, 512, 512]