Han Xue1*,
Jieji Ren1*,
Wendi Chen1*,
Gu Zhang234†,
Yuan Fang1†,
Guoying Gu,
Huazhe Xu234‡,
Cewu Lu15‡
1Shanghai Jiao Tong University
2Tsinghua University, IIIS
3Shanghai Qi Zhi Institute
4Shanghai AI Lab
5Shanghai Innovation Institute
*Equal contribution
†Equal contribution
‡Equal advising
- Release the code of TactAR and Quest3 APP.
- Release the code of RDP.
- Release the data.
- Release the checkpoints.
- Add guide for customized tasks, tactile / force sensors and robots.
- Add guide for creating the tactile dataset and the tactile embedding.
- Support more robots (e.g. Franka) (ETA: May 2025).
Please refer to docs/customized_deployment_guide.md.
- Meta Quest 3 VR headset.
- Workstation with Ubuntu 22.04 for compatibility with ROS2 Humble.
A workstation with a GPU (e.g., NVIDIA RTX 3090) is required. If GelSight Mini is used, a high-performance CPU (e.g., Core i9-13900K) is required to ensure 24 FPS tactile sensing.
- 2 robot arms with (optional) joint torque sensors.
We use Flexiv Rizon 4 with the GRAV gripper by default and will support single-arm robot and other robots soon.
- 1-3 RealSense cameras.
We use D435 for wrist camera and D415 for external cameras. Follow the official document to install librealsense2.
- (Optional) 1-2 GelSight Mini tactile sensors with tracking marker gel.
We use 1 sensor for each robot arm. Download the CAD model and 3D print the mount to attach the sensor to the GRAV gripper.
Build and install the TactAR APP on the Quest 3 according to the README in our Unity Repo.
You can also adopt other Teleoperation system with force feedback.
- Follow the official document to install ROS2 Humble.
- Since ROS2 has some compatibility issues with Conda,
we recommend using a virtual environment with
venv
.python3 -m venv rdp_venv source rdp_venv/bin/activate pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117 pip install -r requirements.txt
- (Optional) Follow third_party/mvsdk/README.md to install MindVision SDK. This package is needed only if you want to record experiment videos with MindVision cameras.
- For the workstation,
the environment and the task have to be configured first and
then start several services for teleoperating robots, publishing sensor data and record the data.
- Environment and Task Configuration.
- Calibration.
Example calibration files are proviced in data/calibration.
Each
A_to_B_transform.json
contains the transformation from coordinate system A to coordinate system B.We use calibration files only for bimanual manipulation.
- Environment Configuration.
Edit reactive_diffusion_policy/config/task/real_robot_env.yaml
to configure the environment settings including
host_ip
,robot_ip
,vr_server_ip
andcalibration_path
. - Task Configuration. Create task config file which assigns the camera and sensor to use. You can take reactive_diffusion_policy/config/task/real_peel_two_realsense_one_gelsight_one_mctac_24fps.yaml as an example. Refer to docs/customized_deployment_guide.md for more details.
- Calibration.
Example calibration files are proviced in data/calibration.
Each
- Start services. Run each command in a separate terminal.
You can use tmux to split the terminal.
# start teleoperation server python teleop.py task=[task_config_file_name] # start camera node launcher python camera_node_launcher.py task=[task_config_file_name] # start data recorder python record_data.py --save_to_disk --save_file_dir [task_data_dir] --save_file_name [record_seq_file_name]
- Environment and Task Configuration.
- For Quest 3, follow the user guide in our Unity Repo to run the TactAR APP.
Please refer to docs/data_collection_tips.md.
We provide the data we collected on .
Please refer to the tactile embedding guide for collecting your own tactile dataset and generate tactile embedding based on this dataset. You can also directly use our pre-calculated PCA transformation matrix for tactile embedding in data/PCA_Transform_GelSight.
Change the config in post_process_data.py
(including TAG
, ACTION_DIM
, TEMPORAL_DOWNSAMPLE_RATIO
and SENSOR_MODE
)
and execute the command
python post_process_data.py
We use Zarr to store the data. After postprocessing, you may see the following structure:
├── action (25710, 4) float32
├── external_img (25710, 240, 320, 3) uint8
├── left_gripper1_img (25710, 240, 320, 3) uint8
├── left_gripper1_initial_marker (25710, 63, 2) float32
├── left_gripper1_marker_offset (25710, 63, 2) float32
├── left_gripper1_marker_offset_emb (25710, 15) float32
├── left_gripper2_img (25710, 240, 320, 3) uint8
├── left_gripper2_initial_marker (25710, 25, 2) float32
├── left_gripper2_marker_offset (25710, 25, 2) float32
├── left_gripper2_marker_offset_emb (25710, 15) float32
├── left_robot_gripper_force (25710, 1) float32
├── left_robot_gripper_width (25710, 1) float32
├── left_robot_tcp_pose (25710, 9) float32
├── left_robot_tcp_vel (25710, 6) float32
├── left_robot_tcp_wrench (25710, 6) float32
├── left_wrist_img (25710, 240, 320, 3) uint8
├── right_robot_gripper_force (25710, 1) float32
├── right_robot_gripper_width (25710, 1) float32
├── right_robot_tcp_pose (25710, 9) float32
├── right_robot_tcp_vel (25710, 6) float32
├── right_robot_tcp_wrench (25710, 6) float32
├── target (25710, 4) float32
└── timestamp (25710,) float32
- Task Configuration.
In addition to the task config file used in data collection,
another file is needed to configure dataset, runner, and model-related parameters such as
obs
andaction
. You can take reactive_diffusion_policy/config/task/real_peel_image_dp_absolute_12fps.yaml as an example. Refer to docs/customized_deployment_guide.md for more details.The
dp
,at
andldp
in the config file name indicate the Diffusion policy, Asymmetric Tokenizer and Latent Diffusion Policy. - Generate Dataset with Correct Frequency.
The
fps
at the end of config file name indicates the control frequency. Make sure thecontrol_fps
in the task config file is consistent with dataset. For instance, we record 24fps data and want to train a model with 12fps control frequency, then we have to modify theTEMPORAL_DOWNSAMPLE_RATIO
in post_process_data.py to 2 to generate the correct dataset.We record 24fps data for experiments. We train DP with 12fps control frequency and RDP (AT + LDP) with 24fps control frequency.
- Run the Training Script.
We provide training scripts for Diffusion Policy and Reactive Diffusion Policy.
You can modify the training script to train the desired task and model.
# config multi-gpu training accelerate config # Diffusion Policy ./train_dp.sh # Reactive Diffusion Policy ./train_rdp.sh
- (Optional) Refer to
vcamera_server_ip
andvcamera_server_port
in the task config file and start the corresponding vcamera server# run vcamera server python vcamera_server.py --host_ip [host_ip] --port [port] --camera_id [camera_id]
- Modify eval.sh to set the task and model you want to evaluate
and run the command in separate terminals.
# start teleoperation server python teleop.py task=[task_config_file_name] # start camera node launcher python camera_node_launcher.py task=[task_config_file_name] # start inference ./eval.sh
We provide the checkpoints in our experiments
on .
Please refer to docs/Q&A.md.
Our work is built upon Diffusion Policy, VQ-BeT, Stable Diffusion, UMI and Data Scaling Laws. Thanks for their great work!
If you find our work useful, please consider citing:
@article{xue2025reactive,
title = {Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation},
author = {Xue, Han and Ren, Jieji and Chen, Wendi and Zhang, Gu and Fang, Yuan and Gu, Guoying and Xu, Huazhe and Lu, Cewu},
journal = {arXiv preprint arXiv:2503.02881},
year = {2025}
}