Skip to content

[generate] model defaults being inherited only happens for newer models #36881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 21, 2025

Conversation

gante
Copy link
Member

@gante gante commented Mar 21, 2025

What does this PR do?

#36684 activated the use of model-defaults (as opposed to global defaults) for unfilled parameters in the generation_config argument.

It may lead to unexpected behavior on models that saw no changes, especially if the model has defined flags like max_length or max_new_tokens. This PR adds a flag to control this behavior, and makes it the default only for models saved from v4.50 (next release) onwards.

@github-actions github-actions bot marked this pull request as draft March 21, 2025 09:56
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@gante gante requested a review from ArthurZucker March 21, 2025 09:57
@gante gante marked this pull request as ready for review March 21, 2025 09:57
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that sounds good!

@gante gante force-pushed the gen_param_inheriance_opt_in branch from a909bfe to 423ec7e Compare March 21, 2025 10:26
@gante gante merged commit 94f4876 into huggingface:main Mar 21, 2025
23 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante gante deleted the gen_param_inheriance_opt_in branch March 21, 2025 11:20
@MTDickens
Copy link

I personally think the processing logic can be further improved.

For example, consider a user wanting to ensure greedy decoding by explicitly setting my_config = GenerationConfig(do_sample=False). Since do_sample=False is also the default in the base GenerationConfig, the merging logic might interpret this as "not specified by the user." Consequently, if a model-specific configuration (like gemma3_config) defaults to do_sample=True, it could unintentionally override the user's explicit do_sample=False setting.

The core issue is that the current approach seems to primarily track deviations from the default, rather than explicit user intent.

Perhaps the GenerationConfig object or the merging process could internally track which parameters were explicitly provided by the user during instantiation, even if the provided value matches the default. This would allow the merging logic to prioritize any explicitly user-set value over model-specific defaults.

I'm not familiar enough with the codebase to propose a specific implementation, but hopefully, this description of the problem and the suggested conceptual approach is helpful.

@gante
Copy link
Member Author

gante commented Apr 18, 2025

@MTDickens yes, this will be improved :) Our path forward will consist of having per-model GenerationConfig classes, like we have per-model config classes. That way, when instantiating a new generation config, the starting point are the model-specific defaults, as opposed to generalist defaults + convoluted logic

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants