-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Add NeoBERT #14164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add NeoBERT #14164
Conversation
if self.model_arch == gguf.MODEL_ARCH.NEO_BERT: | ||
n_ff = int(2 * n_ff / 3) # NeoBERT uses 2/3 of the intermediate size as feed forward length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place to override this. Move it to the NeoBert
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's problematic as you can't use the base method then without getting duplicate keys, alternatively, just deactivate setting this value here, similar to the head_dim
workaround for DeepSeekV3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In any case we should try to avoid stacking up special cases here in some way. Maybe add a function that allows to override existing keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add an override option to the GGUFWriter
methods I suppose.
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Support NeoBERT