🐌 Clean two-sided clipping #3499

qgallouedec · 2025-05-27T05:01:34Z

Some late review on #3434

qgallouedec · 2025-05-27T05:50:42Z

tests/test_grpo_trainer.py

It might seem risky to remove these tests, but having tests that are so tightly coupled to internal implementation details (like calls to _get_log_prob) makes future development extremly difficult. Even if it's less secured, I recommend manually inspecting intermediate results instead, and keeping unit tests focused on what they're meant for: validating external behavior.

Ah sorry this was my fault for asking to test the new functionality. I agree with you that we can remove these tests, with the comment that the current ones only test for a change in the weight, not whether the clipping is actually behaving as expected. I suppose those thing either need to be exposed as public methods or part of integration tests where we can access the metrics to validate the values are clipped accordingy

HuggingFaceDocBuilderDev · 2025-05-27T05:55:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lewtun

Thanks for the clean up and apologies for being the one who asked for the original tests!

lewtun · 2025-05-27T12:26:26Z

trl/trainer/grpo_trainer.py

-        else:
-            # Original GRPO clipping (only lower bound implicitly applied by the final min)
-            per_token_loss1 = coef_1 * advantages.unsqueeze(1)
+            coef_1 = torch.clamp(coef_1, max=self.args.delta)


Nice refactor!

lewtun · 2025-05-27T12:29:20Z

tests/test_grpo_trainer.py

Ah sorry this was my fault for asking to test the new functionality. I agree with you that we can remove these tests, with the comment that the current ones only test for a change in the weight, not whether the clipping is actually behaving as expected. I suppose those thing either need to be exposed as public methods or part of integration tests where we can access the metrics to validate the values are clipped accordingy

kashif · 2025-05-27T13:01:43Z

i can try to integrate into liger too

qgallouedec · 2025-05-27T15:51:31Z

Yes, I completely understand the motivation behind these tests. Ideally, we should have this kind of test that checks if the loss is well calculated with a few reference values, but considering the current implementation, it requires patching and all, so it's not ideal. If in the future we split the compute_loss method into several public sub-functions, it might be simpler.

qgallouedec added 5 commits May 27, 2025 05:00

Some cleaning

7086d48

Merge branch 'main' into clean-two-sided-clipping

ce6c1c7

revert sampling modif

97ea69a

same

81eeafb

refactor test

e5df859

qgallouedec commented May 27, 2025

View reviewed changes

qgallouedec marked this pull request as ready for review May 27, 2025 05:51

Empty commit to trigger test

5990030

qgallouedec changed the title ~~Clean two sided clipping~~ 🐌 Clean two-sided clipping May 27, 2025

qgallouedec requested review from kashif, edbeeching, lewtun and shirinyamani May 27, 2025 05:55

lewtun approved these changes May 27, 2025

View reviewed changes

kashif approved these changes May 27, 2025

View reviewed changes

fix test

f9b2739

qgallouedec merged commit ac18c9d into main May 27, 2025
11 checks passed

qgallouedec deleted the clean-two-sided-clipping branch May 27, 2025 16:39

hjh0119 mentioned this pull request Jun 3, 2025

[grpo] Two-Sided Clipping for GRPO Trainer modelscope/ms-swift#4450

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐌 Clean two-sided clipping #3499

🐌 Clean two-sided clipping #3499

Uh oh!

qgallouedec commented May 27, 2025

Uh oh!

qgallouedec May 27, 2025 •

edited

Loading

Uh oh!

lewtun May 27, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 27, 2025

Uh oh!

lewtun left a comment

Uh oh!

lewtun May 27, 2025

Uh oh!

lewtun May 27, 2025

Uh oh!

kashif commented May 27, 2025

Uh oh!

qgallouedec commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

🐌 Clean two-sided clipping #3499

🐌 Clean two-sided clipping #3499

Uh oh!

Conversation

qgallouedec commented May 27, 2025

Uh oh!

qgallouedec May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lewtun May 27, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented May 27, 2025

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

lewtun May 27, 2025

Choose a reason for hiding this comment

Uh oh!

lewtun May 27, 2025

Choose a reason for hiding this comment

Uh oh!

kashif commented May 27, 2025

Uh oh!

qgallouedec commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

qgallouedec May 27, 2025 •

edited

Loading