Skip to content

microsoft/GUI-Agent-RL

Repository files navigation

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Project Paper

We propose an environment-free RL framework that decouples value estimation from policy optimization by leveraging a pretrained Value Environment Model (VEM). VEM predicts state-action values directly from offline data, distilling human-like priors about GUI interaction outcomes without requiring next-state prediction or environmental feedback. The framework operates in two stages: (1) pretraining VEM to estimate long-term action utilities and (2) guiding policy exploration with frozen VEM signals, enabling layout-agnostic GUI automation.

Quick Start 🚀

Step 1: Build Environment

conda env create -f environment.yml
conda activate lam-rl

git clone https://github.com/hiyouga/LLaMA-Factory.git 
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

Step 2: Prepare Images and Annotations

  • Download raw images from SeeClick Website and place them under images/aitw_images.
  • To get the labeled data for training the critic model, fill in the api_key and model_name in configs/gpt_config.yaml.
  • Then run python3 data_preprocess/aitw.py to generate the data for training the critic model and policy model.

Step 3: Prepare Checkpoints

Download the checkpoints from:

Organize the files as follows:

GUI-Agent-RL/
    data/
      aitw_anns/
      images/
    images/
      aitw_images/
    checkpoints/
      Auto-UI-Base/
      blip2-opt-2.7b/
      roberta-base/
      Qwen2-VL-7B-Instruct/

Step 4: Train the Critic Model

We use the LLaMA-Factory to train the critic model. Based on our settings, this requires 8 A100 GPUs and uses LoRA for training.

  • To obtain the critic model checkpoint for the AITW general task, run:
    sh scripts/train_critic_general.sh
    The critic checkpoints will be stored in checkpoints/critic_general.
  • To obtain the critic model checkpoint for the AITW webshopping task, run:
    sh scripts/train_critic_webshopping.sh
    The critic checkpoints will be stored in checkpoints/critic_webshopping.

You can modify the output path by changing the output_dir in the YAML file. Remember to fill in the adapter_name_or_path and export_dir in configs/critic_merge.yaml when merging LoRA.

Step 5: Train the Policy Model

After obtaining the critic model, we use AutoGUI as the base policy model for training:

python3 train.py --task general
python3 train.py --task webshopping

Checkpoints are saved in checkpoints/policy_general and checkpoints/policy_webshopping by default.

Step 6: Evaluation

  • Offline Evaluation Please modify the save_path to point to the exact checkpoints you want to evaluate (which you obtained in Step 5).
    python3 train.py --task general --eval
    python3 train.py --task webshopping --eval
  • Online Evaluation
    • Set up the Android environment according to this page, obtain the URL, and fill in the appium_server_url in configs/online_eval.yaml.
    • Run the agent demo using:
      python3 models/demo.py --model_path xxx
      Obtain the Gradio public URL and fill in the agent_url in configs/online_eval_general.yaml or configs/online_eval_webshopping.yaml.
    • Execute:
      python3 eval_online.py --task general
      python3 eval_online.py --task webshopping

Citation

If you find this repository useful, please considering giving ⭐ or citing:

@misc{zheng2025vemenvironmentfreeexplorationtraining,
      title={VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model}, 
      author={Jiani Zheng and Lu Wang and Fangkai Yang and Chaoyun Zhang and Lingrui Mei and Wenjie Yin and Qingwei Lin and Dongmei Zhang and Saravan Rajmohan and Qi Zhang},
      year={2025},
      eprint={2502.18906},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.18906}, 
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published