Skip to content

Error: "Can only compare inequalities with Expr" when using zero2.yaml to grpo.py #186

Open
@pyh314

Description

@pyh314

I tried to use A100×4 to train Qwen2.5-1.5B-Open-R1-GRPO using grpo.py with zero2.yaml accelerating training. However I met the error below:

{'loss': 0.0082, 'grad_norm': 0.11666934192180634, 'learning_rate': 1.770159216547532e-05, 'completion_length': 604.1252765655518, 'rewards/accuracy_reward': 0.5635416825767606, 'rewards/format_reward': 0.0, 'reward': 0.5635416825767606, 'reward_std': 0.22564153927378355, 'kl': 0.20462188720703126, 'epoch': 0.3}

 30%|██▉       | 450/1509 [13:53:23<31:16:45, 106.33s/it]
 30%|██▉       | 451/1509 [13:55:11<31:22:45, 106.77s/it]
 30%|██▉       | 452/1509 [13:57:00<31:35:13, 107.58s/it]
 30%|███       | 453/1509 [13:58:50<31:46:41, 108.34s/it]
 30%|███       | 454/1509 [14:00:40<31:54:42, 108.89s/it]
 30%|███       | 455/1509 [14:02:26<31:35:35, 107.91s/it][rank1]: Traceback (most recent call last):
[rank1]:   File "/home/yhpeng/open-r1/src/open_r1/grpo.py", line 237, in <module>
[rank1]:     main(script_args, training_args, model_args)
[rank1]:   File "/home/yhpeng/open-r1/src/open_r1/grpo.py", line 189, in main
[rank1]:     train_result = trainer.train(resume_from_checkpoint=checkpoint)
[rank1]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/transformers/trainer.py", line 2175, in train
[rank1]:     return inner_training_loop(
[rank1]:            ^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/transformers/trainer.py", line 2490, in _inner_training_loop
[rank1]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank1]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/transformers/trainer.py", line 3598, in training_step
[rank1]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/trl/trainer/grpo_trainer.py", line 494, in compute_loss
[rank1]:     output_reward_func = reward_func(prompts=prompts, completions=completions, **reward_kwargs)
[rank1]:                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/open-r1/src/open_r1/grpo.py", line 81, in accuracy_reward
[rank1]:     reward = float(verify(answer_parsed, gold_parsed))
[rank1]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/math_verify/grader.py", line 447, in verify
[rank1]:     return any(compare_single_extraction_wrapper(g, t) for g, t in product(gold, target))
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/math_verify/grader.py", line 447, in <genexpr>
[rank1]:     return any(compare_single_extraction_wrapper(g, t) for g, t in product(gold, target))
[rank1]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/math_verify/grader.py", line 438, in compare_single_extraction_wrapper
[rank1]:     return compare_single_extraction(g, t)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/math_verify/utils.py", line 50, in wrapper
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/math_verify/grader.py", line 420, in compare_single_extraction
[rank1]:     return sympy_expr_eq(gold, target, precision)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/math_verify/grader.py", line 365, in sympy_expr_eq
[rank1]:     return sympy_compare_sets(gold, pred, precision)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/math_verify/grader.py", line 316, in sympy_compare_sets
[rank1]:     if a_set.symmetric_difference(b_set).is_empty:
[rank1]:        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/sets/sets.py", line 259, in symmetric_difference
[rank1]:     return SymmetricDifference(self, other)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/sets/sets.py", line 2183, in __new__
[rank1]:     return SymmetricDifference.reduce(a, b)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/sets/sets.py", line 2189, in reduce
[rank1]:     result = B._symmetric_difference(A)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/sets/sets.py", line 262, in _symmetric_difference
[rank1]:     return Union(Complement(self, other), Complement(other, self))
[rank1]:                  ^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/sets/sets.py", line 1721, in __new__
[rank1]:     return Complement.reduce(a, b)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/sets/sets.py", line 1731, in reduce
[rank1]:     if B == S.UniversalSet or A.is_subset(B):
[rank1]:                               ^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/sets/sets.py", line 413, in is_subset
[rank1]:     ret = self._eval_is_subset(other)
[rank1]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/sets/sets.py", line 2056, in _eval_is_subset
[rank1]:     return fuzzy_and(other._contains(e) for e in self.args)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/logic.py", line 142, in fuzzy_and
[rank1]:     for ai in args:
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/sets/sets.py", line 2056, in <genexpr>
[rank1]:     return fuzzy_and(other._contains(e) for e in self.args)
[rank1]:                      ^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/sets/sets.py", line 2053, in _contains
[rank1]:     return Or(*[Eq(e, other, evaluate=True) for e in self.args])
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/operations.py", line 513, in __new__
[rank1]:     _args = frozenset(cls._new_args_filter(args))
[rank1]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/logic/boolalg.py", line 741, in _new_args_filter
[rank1]:     c = x.canonical
[rank1]:         ^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/relational.py", line 333, in canonical
[rank1]:     args = tuple([i.canonical if isinstance(i, Relational) else i for i in self.args])
[rank1]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/relational.py", line 333, in <listcomp>
[rank1]:     args = tuple([i.canonical if isinstance(i, Relational) else i for i in self.args])
[rank1]:                   ^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/relational.py", line 335, in canonical
[rank1]:     r = self.func(*args)
[rank1]:         ^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/relational.py", line 852, in __new__
[rank1]:     return cls._eval_relation(lhs, rhs, **options)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/relational.py", line 859, in _eval_relation
[rank1]:     val = cls._eval_fuzzy_relation(lhs, rhs)
[rank1]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/relational.py", line 1168, in _eval_fuzzy_relation
[rank1]:     return is_gt(lhs, rhs)
[rank1]:            ^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/relational.py", line 1273, in is_gt
[rank1]:     return fuzzy_not(is_le(lhs, rhs, assumptions))
[rank1]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/relational.py", line 1281, in is_le
[rank1]:     return is_ge(rhs, lhs, assumptions)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/yhpeng/anaconda3/envs/openr1/lib/python3.11/site-packages/sympy/core/relational.py", line 1380, in is_ge
[rank1]:     raise TypeError("Can only compare inequalities with Expr")
[rank1]: TypeError: Can only compare inequalities with Expr

Is there anyone can help? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions