Skip to content

dexmal/realtime-vla-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate

This repository contains the code for the paper Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate, and provides a deployment stack for real-world dual-arm manipulation with fast, smooth, and accurate execution.

In deployment of VLA models to real-world robotic tasks, execution speed matters. Beyond fast GPU inference, this project focuses on the remaining bottlenecks in the full deployment stack, including calibration, action execution, control, and learning-based speed selection. The end-to-end result is that on real-world tasks requiring both dexterity and accuracy, the robot can execute about 3x faster than a standard baseline, reaching casual human speed while staying close to the robot hardware limit.

The repository contains:

  • server/: remote inference service with Pi05 JAX and Pi05 Triton backends, together with time-axis action planning
  • client/: local runtime stack including robot and camera I/O, observer / actuator bindings, executor implementations, aligned logging, asynchronous video recording, and YAML-based task switching
  • modular builder entrypoints in server/builders.py and client/builders.py, which make it easy to extend the codebase with custom model backends, robots, observers, actuators, executors, and task configurations

The Triton backend is built on top of dexmal/realtime-vla and extends it with realtime chunking / action prefill style usage from Training-Time Action Conditioning for Efficient Real-Time Chunking.

Resources

The table below lists task demos and runtime logs.

Task Demo Video RRD Log
Cloth Folding Demo RRD
Chip Placement Demo RRD
Box Placement Demo RRD

Installation

The commands below assume you are in the repository root.

conda create -n realtime-vla-v2 python=3.10 -y
conda activate realtime-vla-v2
python -m pip install --upgrade pip
pip install -r requirements.txt

Notes:

  • The server is intended to run on an NVIDIA GPU machine compatible with your torch and triton installation.
  • The repository provides a mock configuration for running through the end-to-end code path without real robot hardware.
  • For real robot deployment, airbot_real corresponds to the AIRBOT W1 SDK.
  • If you use a different robot stack, you can extend your own robot configuration by adding new implementations and registering them in client/builders.py.

How to Use

All runtime parameters are configured in YAML.

Choose one matching server config and one matching client config for the same task.

Cloth Folding

Server:

python server/infer_server.py --config server/config_cloth.yaml

Client:

python client/local_client.py --config client/config_cloth.yaml

Chip Placement

Server:

python server/infer_server.py --config server/config_chip.yaml

Client:

python client/local_client.py --config client/config_chip.yaml

Box Placement

Server:

python server/infer_server.py --config server/config_box.yaml

Client:

python client/local_client.py --config client/config_box.yaml

Mock Run

Client:

python client/local_client.py --config client/config_mock.yaml

Logging

The client saves runtime outputs to the directory specified by visualization.output_dir in the selected YAML.

Recording includes:

  • aligned trajectory logs in jsonl
  • asynchronous multi-camera video writing
  • in rrd, actual_action denotes the delay-aligned measured robot state
  • for MPC tasks, raw_pre_mpc_action denotes the direct model output, pre_mpc_action denotes the time-parameterized trajectory before local MPC, and post_mpc_action denotes the locally optimized command that is actually sent to the robot
  • for smooth / raw-action tasks, raw_pre_smooth_action denotes the direct model output, pre_smooth_action denotes the time-parameterized trajectory before local smoothing, and post_smooth_action denotes the locally smoothed / tracked command that is actually sent to the robot
  • inference-complete markers are overlaid on pre_mpc_action or pre_smooth_action to show inference timing on the trajectory

Acknowledgements

Citation

If you want, you can cite this work with:

@article{yang2026realtimevlav2,
  title={Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate},
  author={Yang, Chen and Hu, Yucheng and Ma, Yunchao and Yang, Yunhuan and Tan, Jing and Fan, Haoqiang},
  journal={arXiv preprint arXiv:2603.26360},
  year={2026}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages