Could you share the evaluate code? #6

Matcha-Liu · 2025-02-24T09:29:09Z

In the code repository there is only the test code that shows the reasoning process, could you share the evaluate code to get an accuracy on a whole dataset ?

ZhaochenYu0201 · 2025-02-25T06:14:51Z

Thanks for your concern. The current version of ReasonFlux is our demo version, showing the inference process and including the SFT stage training script. The test code aims to demonstrate the inference logic. We plan to release the full evaluation code (for full dataset accuracy) with the RL stage training code later.
For now, you can try existing open-source evaluation libraries like lm-evaluation-harness and Evalchemy, to evaluate the model. Please stay tuned for our future updates.

YangLing0818 · 2025-03-22T00:28:57Z

In the code repository there is only the test code that shows the reasoning process, could you share the evaluate code to get an accuracy on a whole dataset ?

We have updated a new model with its evaluation code, please check.

Matcha-Liu · 2025-03-22T16:26:41Z

Thanks for your contribution!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you share the evaluate code? #6

Could you share the evaluate code? #6

Matcha-Liu commented Feb 24, 2025

ZhaochenYu0201 commented Feb 25, 2025

YangLing0818 commented Mar 22, 2025

Matcha-Liu commented Mar 22, 2025

Could you share the evaluate code? #6

Could you share the evaluate code? #6

Comments

Matcha-Liu commented Feb 24, 2025

ZhaochenYu0201 commented Feb 25, 2025

YangLing0818 commented Mar 22, 2025

Matcha-Liu commented Mar 22, 2025