Skip to content

[RSS 2025] Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation

License

Notifications You must be signed in to change notification settings

xiaoxiaoxh/reactive_diffusion_policy

Repository files navigation

Reactive Diffuison Policy:

Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation

RSS 2025

Han Xue1*, Jieji Ren1*, Wendi Chen1*,
Gu Zhang234†, Yuan Fang1†, Guoying Gu, Huazhe Xu234‡, Cewu Lu15‡
1Shanghai Jiao Tong University 2Tsinghua University, IIIS 3Shanghai Qi Zhi Institute
4Shanghai AI Lab 5Shanghai Innovation Institute
*Equal contribution Equal contribution Equal advising

arXiv      project website      data      checkpoints      powered by Pytorch     

teaser

TODO

⚙️ Environment Setup

📝 Use Customized Tactile / Force Sensors, Robots and Customized Tasks

Please refer to docs/customized_deployment_guide.md.

Hardware

Device List

  • Meta Quest 3 VR headset.
  • Workstation with Ubuntu 22.04 for compatibility with ROS2 Humble.

    A workstation with a GPU (e.g., NVIDIA RTX 3090) is required. If GelSight Mini is used, a high-performance CPU (e.g., Core i9-13900K) is required to ensure 24 FPS tactile sensing.

  • 2 robot arms with (optional) joint torque sensors.

    We use Flexiv Rizon 4 with the GRAV gripper by default and will support single-arm robot and other robots soon.

  • 1-3 RealSense cameras.

    We use D435 for wrist camera and D415 for external cameras. Follow the official document to install librealsense2.

  • (Optional) 1-2 GelSight Mini tactile sensors with tracking marker gel.

    We use 1 sensor for each robot arm. Download the CAD model and 3D print the mount to attach the sensor to the GRAV gripper.

Software

Quest 3 Setup

Build and install the TactAR APP on the Quest 3 according to the README in our Unity Repo.

You can also adopt other Teleoperation system with force feedback.

Workstation Setup

  1. Follow the official document to install ROS2 Humble.
  2. Since ROS2 has some compatibility issues with Conda, we recommend using a virtual environment with venv.
    python3 -m venv rdp_venv
    source rdp_venv/bin/activate
    pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
    pip install -r requirements.txt
  3. (Optional) Follow third_party/mvsdk/README.md to install MindVision SDK. This package is needed only if you want to record experiment videos with MindVision cameras.

📦 Data Collection

TactAR Setup

  1. For the workstation, the environment and the task have to be configured first and then start several services for teleoperating robots, publishing sensor data and record the data.
    1. Environment and Task Configuration.
    2. Start services. Run each command in a separate terminal. You can use tmux to split the terminal.
      # start teleoperation server
      python teleop.py task=[task_config_file_name]
      # start camera node launcher
      python camera_node_launcher.py task=[task_config_file_name]
      # start data recorder
      python record_data.py --save_to_disk --save_file_dir [task_data_dir] --save_file_name [record_seq_file_name]
  2. For Quest 3, follow the user guide in our Unity Repo to run the TactAR APP.

(Important) Data Collection Tips

Please refer to docs/data_collection_tips.md.

Example Data

We provide the data we collected on data.

Generate your own tactile embedding

Please refer to the tactile embedding guide for collecting your own tactile dataset and generate tactile embedding based on this dataset. You can also directly use our pre-calculated PCA transformation matrix for tactile embedding in data/PCA_Transform_GelSight.

Data Postprocessing

Change the config in post_process_data.py (including TAG, ACTION_DIM, TEMPORAL_DOWNSAMPLE_RATIO and SENSOR_MODE) and execute the command

python post_process_data.py

We use Zarr to store the data. After postprocessing, you may see the following structure:

 ├── action (25710, 4) float32
 ├── external_img (25710, 240, 320, 3) uint8
 ├── left_gripper1_img (25710, 240, 320, 3) uint8
 ├── left_gripper1_initial_marker (25710, 63, 2) float32
 ├── left_gripper1_marker_offset (25710, 63, 2) float32
 ├── left_gripper1_marker_offset_emb (25710, 15) float32
 ├── left_gripper2_img (25710, 240, 320, 3) uint8
 ├── left_gripper2_initial_marker (25710, 25, 2) float32
 ├── left_gripper2_marker_offset (25710, 25, 2) float32
 ├── left_gripper2_marker_offset_emb (25710, 15) float32
 ├── left_robot_gripper_force (25710, 1) float32
 ├── left_robot_gripper_width (25710, 1) float32
 ├── left_robot_tcp_pose (25710, 9) float32
 ├── left_robot_tcp_vel (25710, 6) float32
 ├── left_robot_tcp_wrench (25710, 6) float32
 ├── left_wrist_img (25710, 240, 320, 3) uint8
 ├── right_robot_gripper_force (25710, 1) float32
 ├── right_robot_gripper_width (25710, 1) float32
 ├── right_robot_tcp_pose (25710, 9) float32
 ├── right_robot_tcp_vel (25710, 6) float32
 ├── right_robot_tcp_wrench (25710, 6) float32
 ├── target (25710, 4) float32
 └── timestamp (25710,) float32

📚 Training

  1. Task Configuration. In addition to the task config file used in data collection, another file is needed to configure dataset, runner, and model-related parameters such as obs and action. You can take reactive_diffusion_policy/config/task/real_peel_image_dp_absolute_12fps.yaml as an example. Refer to docs/customized_deployment_guide.md for more details.

    The dp, at and ldp in the config file name indicate the Diffusion policy, Asymmetric Tokenizer and Latent Diffusion Policy.

  2. Generate Dataset with Correct Frequency. The fps at the end of config file name indicates the control frequency. Make sure the control_fps in the task config file is consistent with dataset. For instance, we record 24fps data and want to train a model with 12fps control frequency, then we have to modify the TEMPORAL_DOWNSAMPLE_RATIO in post_process_data.py to 2 to generate the correct dataset.

    We record 24fps data for experiments. We train DP with 12fps control frequency and RDP (AT + LDP) with 24fps control frequency.

  3. Run the Training Script. We provide training scripts for Diffusion Policy and Reactive Diffusion Policy. You can modify the training script to train the desired task and model.
    # config multi-gpu training
    accelerate config
    # Diffusion Policy
    ./train_dp.sh
    # Reactive Diffusion Policy
    ./train_rdp.sh

🚀 Inference

  1. (Optional) Refer to vcamera_server_ip and vcamera_server_port in the task config file and start the corresponding vcamera server
    # run vcamera server
    python vcamera_server.py --host_ip [host_ip] --port [port] --camera_id [camera_id]
  2. Modify eval.sh to set the task and model you want to evaluate and run the command in separate terminals.
    # start teleoperation server
    python teleop.py task=[task_config_file_name]
    # start camera node launcher
    python camera_node_launcher.py task=[task_config_file_name]
    # start inference
    ./eval.sh

Checkpoints

We provide the checkpoints in our experiments on checkpoints.

❔ Q&A

Please refer to docs/Q&A.md.

🙏 Acknowledgement

Our work is built upon Diffusion Policy, VQ-BeT, Stable Diffusion, UMI and Data Scaling Laws. Thanks for their great work!

🔗 Citation

If you find our work useful, please consider citing:

@article{xue2025reactive,
  title     = {Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation},
  author    = {Xue, Han and Ren, Jieji and Chen, Wendi and Zhang, Gu and Fang, Yuan and Gu, Guoying and Xu, Huazhe and Lu, Cewu},
  journal   = {arXiv preprint arXiv:2503.02881},
  year      = {2025}
}

About

[RSS 2025] Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published