Official code release for
RetoVLA: Reusing Register Tokens for Spatial Reasoning in Vision-Language-Action Models
🎉 Accepted to ICRA 2026 🇦🇹
Jiyeon Koo†, Taewan Cho†, Hyunjoon Kang, Eunseom Pyo, Tae Gyun Oh, Taeryang Kim, Andrew Jaeyong Choi*
School of Computing, Gachon University
†Co-first authors, *corresponding author
RetoVLA improves lightweight Vision-Language-Action policies by reusing register tokens as a compact source of global spatial context. Instead of discarding those tokens after visual processing, RetoVLA routes them into the Action Expert through gated key/value injection, improving spatial reasoning while preserving a lightweight SmolVLA-style backbone.
This repository is the official RetoVLA code release. The implementation is provided as a minimal patch on top of LeRobot's SmolVLA codebase, together with training and evaluation scripts for LIBERO Spatial.
- Official RetoVLA implementation built on top of LeRobot SmolVLA.
- Register-token reuse with expert-side spatial context injection.
- Training and evaluation scripts for LIBERO Spatial.
- Minimal public release focused on code and reproducibility.
RetoVLA/
├── code/lerobot_patch/src/lerobot/policies/smolvla/
│ ├── configuration_smolvla.py
│ ├── modeling_smolvla.py
│ └── smolvlm_with_expert.py
├── scripts/
│ ├── train_libero_spatial_retovla.sh
│ └── eval_libero_spatial.sh
RetoVLA keeps the standard SmolVLA pipeline and augments the Action Expert with a second stream of spatial context derived from register tokens. The released patch files contain the configuration changes, policy logic, and expert-side injection path used in the paper.
Real-world montage |
Clean marker on mirror |
Dataset collection |
Custom simulation montage |
- RetoVLA patch files for LeRobot SmolVLA.
- Training and evaluation scripts for LIBERO Spatial.
- Documentation for setup and usage.
- Demo GIFs.
Not included:
- Raw robot datasets.
- Checkpoints and full training logs.
- Static images and other project media.
- Private hardware stack details beyond paper-level description.
- Python 3.10+ recommended.
- PyTorch with CUDA support.
- A local LeRobot environment with SmolVLA and LIBERO dependencies installed.
This is a patch-based release, not a standalone robotics framework. You need a local lerobot checkout before running the training or evaluation scripts below.
export RETOVLA_ROOT=/path/to/RetoVLA
export ROOT_DIR=/path/to/lerobotcp "$RETOVLA_ROOT"/code/lerobot_patch/src/lerobot/policies/smolvla/configuration_smolvla.py \
"$ROOT_DIR"/src/lerobot/policies/smolvla/configuration_smolvla.py
cp "$RETOVLA_ROOT"/code/lerobot_patch/src/lerobot/policies/smolvla/modeling_smolvla.py \
"$ROOT_DIR"/src/lerobot/policies/smolvla/modeling_smolvla.py
cp "$RETOVLA_ROOT"/code/lerobot_patch/src/lerobot/policies/smolvla/smolvlm_with_expert.py \
"$ROOT_DIR"/src/lerobot/policies/smolvla/smolvlm_with_expert.pyRun LIBERO Spatial training with:
bash "$RETOVLA_ROOT"/scripts/train_libero_spatial_retovla.shCommon overrides:
CUDA_VISIBLE_DEVICES=0 \
STEPS=100000 \
OUTPUT_DIR="$ROOT_DIR"/outputs/train/retovla_libero_spatial \
bash "$RETOVLA_ROOT"/scripts/train_libero_spatial_retovla.shEvaluate a trained checkpoint with:
POLICY_PATH="$ROOT_DIR"/outputs/train/retovla_libero_spatial/checkpoints/100000/pretrained_model \
NUM_TRIALS=50 \
bash "$RETOVLA_ROOT"/scripts/eval_libero_spatial.shBy default the scripts read and write under $ROOT_DIR/outputs/....
This repository is released under Apache-2.0. Some patch files are modified derivatives of Apache-licensed LeRobot / Hugging Face SmolVLA components; see LICENSE and NOTICE.
If you find our work useful in your research, please consider citing our paper:
@misc{koo2025retovlareusingregistertokens,
title={RetoVLA: Reusing Register Tokens for Spatial Reasoning in Vision-Language-Action Models},
author={Jiyeon Koo and Taewan Cho and Hyunjoon Kang and Eunseom Pyo and Tae Gyun Oh and Taeryang Kim and Andrew Jaeyong Choi},
year={2025},
eprint={2509.21243},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2509.21243}
}


