Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark

Bingchen Miao^1,2*, Yang Wu²*, Minghe Gao¹*, Qifan Yu¹, Wendong Bu^1,2, Wenqiao Zhang¹

Yunfei Li², Siliang Tang¹, Tat-Seng Chua³, Juncheng Li ¹

¹Zhejiang University, ²Ant Group, ³National University of Singapore

*Equal Contribution.

📖 Introduction

Welcome to the official repository for the paper "Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark". In this work, we propose Similar, a Step-wise Multi-dimensional Generalist Reward Model designed to enhance the training and reasoning capabilities of Generalist Virtual Agents (GVAs) powered by Multimodal Large Language Models (MLLMs). This project addresses critical limitations in current GVA training paradigms, such as the reliance on outcome supervision and labor-intensive human annotations, by introducing a novel approach to fine-grained, step-wise supervision. We are excited to open-source our Similar reward model and the SRMEval test dataset, providing the research community with valuable resources to advance the development of virtual agents.

🔥 News

We will release the benchmark, checkpoints and code soon.
[2025.03.24] We have released our Paper on arxiv.

📝 TODO

Release Paper.
Release SRMEval Benchmark.
Release data collection pipelines of SRM.
Release checkpoints of Similar.
Release inference code of Similar.
Release training scripts of Similar.

💡 Overview

GVAs are designed to process multimodal inputs (e.g., UI elements, text, visuals) and navigate digital environments to perform tasks. However, traditional training methods rely heavily on outcome-based rewards, which lack the granularity needed to guide agents effectively through complex tasks. To overcome these challenges, we propose Similar, a reward model that provides multi-dimensional, step-wise supervision signals to guide agent learning and reasoning.

Applications

Similar enhances GVA performance in two key phases:

Training Phase: As a reward model in reinforcement learning frameworks, Similar provides fine-grained feedback to optimize agent behavior, improving task performance.
Inference Phase: Integrated with search algorithms like MCTS, Similar enhances reasoning efficiency, ensuring that actions lead to successful task completion.

Contributions

Five Key Dimensions: We define five dimensions for step-wise GVA assessment and introduce an automated framework using MCTS-P to collect fine-grained, cross-platform reward model data annotations.
Triple-M Strategy: We propose a Triple-M strategy to train Similar, integrating multiple dimensions and generating synergistic gains for robust, fine-grained feedback.
SRM Benchmark: We introduce SRMEval, a multi-step, multi-dimensional benchmark for evaluating reward models, advancing research in reward model performance assessment.
Superior Performance: Our approach achieves superior GVA performance across diverse tasks and environments, demonstrating the effectiveness of step-wise multi-dimensional assessment and synergistic expert integration.

We are pleased to open-source our Similar reward model and the SRMEval test dataset, providing the community with tools to further advance the field of Generalist Virtual Agents. For more details, please refer to our paper and the accompanying documentation.

🛠️ Setups for SRMEval

Since reward models are crucial for enhancing GVAs' performance, and prior research has not focused on evaluating specific reward models, we propose SRM, the first benchmark in the virtual agent domain for step-wise, multi-dimensional reward model training and evaluation.

we proposed a new task for reward models in the virtual agent domain: Selecting the better action (i.e., the chosen action) from two actions at step i for a given instruction task. The evaluation metric is Accuracy, measuring the reward model's ability to select the better action. Accuracy is calculated under a specific evaluation type.

Here are some cases of SRMEval.

🛠️ Setups for Similar

📜 Citation

If you find this work useful for your research, please cite our paper and star our git repo:

@misc{miao2025boostingvirtualagentlearning,
      title={Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark}, 
      author={Bingchen Miao and Yang Wu and Minghe Gao and Qifan Yu and Wendong Bu and Wenqiao Zhang and Yunfei Li and Siliang Tang and Tat-Seng Chua and Juncheng Li},
      year={2025},
      eprint={2503.18665},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.18665}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
SRM		SRM
SRM_Collection		SRM_Collection
Similar_model/scripts		Similar_model/scripts
assets/figures		assets/figures
checkpoints		checkpoints
scripts/train		scripts/train
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark

📖 Introduction

🔥 News

📝 TODO

💡 Overview

Applications

Contributions

🛠️ Setups for SRMEval

🛠️ Setups for Similar

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

antgroup/Similar

Folders and files

Latest commit

History

Repository files navigation

Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark

📖 Introduction

🔥 News

📝 TODO

💡 Overview

Applications

Contributions

🛠️ Setups for SRMEval

🛠️ Setups for Similar

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages