Skip to content

Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason #197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 6, 2025

Conversation

Some-random
Copy link
Contributor

Also solve the issue referenced here. We should release a stable recipe that can run end-to-end, even with a small model and dataset? This would help users better understand the workflow and facilitate learning.

We provide a minimal reproducible experiment using GRPO for mathematical reasoning, referencing the approach from [SimpleRL-Reason](https://hkust-nlp.notion.site/simplerl-reason) which uses a 7B model trained on 8K examples. Running this on 8 H100 80G GPU takes about 3 hours:

```shell
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes=7 src/open_r1/grpo.py --config recipes/deepseek/DeepSeek-R1-Distill-Qwen-7B/grpo/config_base_math_smalllr.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why you went with zero2 and not zero3?

@edbeeching
Copy link
Collaborator

Merging as I would like to run some benchmarks with the recipe.

@edbeeching edbeeching merged commit 571661a into huggingface:main Feb 6, 2025
1 check passed
@Some-random
Copy link
Contributor Author

Some-random commented Feb 6, 2025

Because people have faced with errors while using zero3

@ctjlewis
Copy link
Contributor

ctjlewis commented Feb 6, 2025

@Some-random @edbeeching, I have reproducible setup and zero3 tested on clean Debian with CUDA 12.1, vllm, 8x H100 in #199. It needed specific package versions pinned.

I have to update branch though. I could test this one for zero3 if we want, after branch update.

@Some-random Some-random deleted the add_recipes_for_math branch February 6, 2025 19:33
GitMonkey0 pushed a commit to GitMonkey0/open-r1 that referenced this pull request Feb 24, 2025
… reasoning on base model, referencing the approach from SimpleRL-Reason (huggingface#197)

* Create config_base_math_smalllr.yaml

* Update README.md

* Update README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants