Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason #197

Some-random · 2025-02-05T19:52:36Z

Also solve the issue referenced here. We should release a stable recipe that can run end-to-end, even with a small model and dataset? This would help users better understand the workflow and facilitate learning.

edbeeching · 2025-02-06T07:49:03Z

README.md

+We provide a minimal reproducible experiment using GRPO for mathematical reasoning, referencing the approach from [SimpleRL-Reason](https://hkust-nlp.notion.site/simplerl-reason) which uses a 7B model trained on 8K examples. Running this on 8 H100 80G GPU takes about 3 hours:
+
+```shell
+ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes=7 src/open_r1/grpo.py --config recipes/deepseek/DeepSeek-R1-Distill-Qwen-7B/grpo/config_base_math_smalllr.yaml


I am curious why you went with zero2 and not zero3?

edbeeching · 2025-02-06T10:43:39Z

Merging as I would like to run some benchmarks with the recipe.

Some-random · 2025-02-06T16:18:42Z

Because people have faced with errors while using zero3

ctjlewis · 2025-02-06T16:25:19Z

@Some-random @edbeeching, I have reproducible setup and zero3 tested on clean Debian with CUDA 12.1, vllm, 8x H100 in #199. It needed specific package versions pinned.

I have to update branch though. I could test this one for zero3 if we want, after branch update.

… reasoning on base model, referencing the approach from SimpleRL-Reason (huggingface#197) * Create config_base_math_smalllr.yaml * Update README.md * Update README.md

Some-random added 3 commits February 5, 2025 11:33

Create config_base_math_smalllr.yaml

4780406

Update README.md

619212a

Update README.md

369aa6b

edbeeching reviewed Feb 6, 2025

View reviewed changes

edbeeching approved these changes Feb 6, 2025

View reviewed changes

edbeeching merged commit 571661a into huggingface:main Feb 6, 2025
1 check passed

ctjlewis mentioned this pull request Feb 6, 2025

fix: easier environment setup; pin trl, transformers #199

Open

Some-random deleted the add_recipes_for_math branch February 6, 2025 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason #197

Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason #197

Uh oh!

Some-random commented Feb 5, 2025

Uh oh!

edbeeching Feb 6, 2025

Uh oh!

edbeeching commented Feb 6, 2025

Uh oh!

Uh oh!

Some-random commented Feb 6, 2025 •

edited

Loading

Uh oh!

ctjlewis commented Feb 6, 2025

Uh oh!

Uh oh!

Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason #197

Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason #197

Uh oh!

Conversation

Some-random commented Feb 5, 2025

Uh oh!

edbeeching Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

edbeeching commented Feb 6, 2025

Uh oh!

Uh oh!

Some-random commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ctjlewis commented Feb 6, 2025

Uh oh!

Uh oh!

Some-random commented Feb 6, 2025 •

edited

Loading