You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the code repository there is only the test code that shows the reasoning process, could you share the evaluate code to get an accuracy on a whole dataset ?
The text was updated successfully, but these errors were encountered:
Thanks for your concern. The current version of ReasonFlux is our demo version, showing the inference process and including the SFT stage training script. The test code aims to demonstrate the inference logic. We plan to release the full evaluation code (for full dataset accuracy) with the RL stage training code later.
For now, you can try existing open-source evaluation libraries like lm-evaluation-harness and Evalchemy, to evaluate the model. Please stay tuned for our future updates.
In the code repository there is only the test code that shows the reasoning process, could you share the evaluate code to get an accuracy on a whole dataset ?
We have updated a new model with its evaluation code, please check.
In the code repository there is only the test code that shows the reasoning process, could you share the evaluate code to get an accuracy on a whole dataset ?
The text was updated successfully, but these errors were encountered: