RetoVLA

Official code release for
RetoVLA: Reusing Register Tokens for Spatial Reasoning in Vision-Language-Action Models

🎉 Accepted to ICRA 2026 🇦🇹

Jiyeon Koo^†, Taewan Cho^†, Hyunjoon Kang, Eunseom Pyo, Tae Gyun Oh, Taeryang Kim, Andrew Jaeyong Choi*
School of Computing, Gachon University
^†Co-first authors, *corresponding author

Overview

RetoVLA improves lightweight Vision-Language-Action policies by reusing register tokens as a compact source of global spatial context. Instead of discarding those tokens after visual processing, RetoVLA routes them into the Action Expert through gated key/value injection, improving spatial reasoning while preserving a lightweight SmolVLA-style backbone.

This repository is the official RetoVLA code release. The implementation is provided as a minimal patch on top of LeRobot's SmolVLA codebase, together with training and evaluation scripts for LIBERO Spatial.

Highlights

Official RetoVLA implementation built on top of LeRobot SmolVLA.
Register-token reuse with expert-side spatial context injection.
Training and evaluation scripts for LIBERO Spatial.
Minimal public release focused on code and reproducibility.

Repository Layout

RetoVLA/
├── code/lerobot_patch/src/lerobot/policies/smolvla/
│   ├── configuration_smolvla.py
│   ├── modeling_smolvla.py
│   └── smolvlm_with_expert.py
├── scripts/
│   ├── train_libero_spatial_retovla.sh
│   └── eval_libero_spatial.sh

Method

RetoVLA keeps the standard SmolVLA pipeline and augments the Action Expert with a second stream of spatial context derived from register tokens. The released patch files contain the configuration changes, policy logic, and expert-side injection path used in the paper.

Demos

_{Real-world montage}	_{Clean marker on mirror}
_{Dataset collection}	_{Custom simulation montage}

Released in This Repository

RetoVLA patch files for LeRobot SmolVLA.
Training and evaluation scripts for LIBERO Spatial.
Documentation for setup and usage.
Demo GIFs.

Not included:

Raw robot datasets.
Checkpoints and full training logs.
Static images and other project media.
Private hardware stack details beyond paper-level description.

Setup

Prerequisites

Python 3.10+ recommended.
PyTorch with CUDA support.
A local LeRobot environment with SmolVLA and LIBERO dependencies installed.

This is a patch-based release, not a standalone robotics framework. You need a local lerobot checkout before running the training or evaluation scripts below.

Configure local paths

export RETOVLA_ROOT=/path/to/RetoVLA
export ROOT_DIR=/path/to/lerobot

Apply the RetoVLA patch

cp "$RETOVLA_ROOT"/code/lerobot_patch/src/lerobot/policies/smolvla/configuration_smolvla.py \
   "$ROOT_DIR"/src/lerobot/policies/smolvla/configuration_smolvla.py
cp "$RETOVLA_ROOT"/code/lerobot_patch/src/lerobot/policies/smolvla/modeling_smolvla.py \
   "$ROOT_DIR"/src/lerobot/policies/smolvla/modeling_smolvla.py
cp "$RETOVLA_ROOT"/code/lerobot_patch/src/lerobot/policies/smolvla/smolvlm_with_expert.py \
   "$ROOT_DIR"/src/lerobot/policies/smolvla/smolvlm_with_expert.py

Training

Run LIBERO Spatial training with:

bash "$RETOVLA_ROOT"/scripts/train_libero_spatial_retovla.sh

Common overrides:

CUDA_VISIBLE_DEVICES=0 \
STEPS=100000 \
OUTPUT_DIR="$ROOT_DIR"/outputs/train/retovla_libero_spatial \
bash "$RETOVLA_ROOT"/scripts/train_libero_spatial_retovla.sh

Evaluation

Evaluate a trained checkpoint with:

POLICY_PATH="$ROOT_DIR"/outputs/train/retovla_libero_spatial/checkpoints/100000/pretrained_model \
NUM_TRIALS=50 \
bash "$RETOVLA_ROOT"/scripts/eval_libero_spatial.sh

By default the scripts read and write under $ROOT_DIR/outputs/....

License

This repository is released under Apache-2.0. Some patch files are modified derivatives of Apache-licensed LeRobot / Hugging Face SmolVLA components; see LICENSE and NOTICE.

Citation

If you find our work useful in your research, please consider citing our paper:

@misc{koo2025retovlareusingregistertokens,
  title={RetoVLA: Reusing Register Tokens for Spatial Reasoning in Vision-Language-Action Models},
  author={Jiyeon Koo and Taewan Cho and Hyunjoon Kang and Eunseom Pyo and Tae Gyun Oh and Taeryang Kim and Andrew Jaeyong Choi},
  year={2025},
  eprint={2509.21243},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2509.21243}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets/video		assets/video
code/lerobot_patch/src/lerobot/policies/smolvla		code/lerobot_patch/src/lerobot/policies/smolvla
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RetoVLA

Overview

Highlights

Repository Layout

Method

Demos

Released in This Repository

Setup

Prerequisites

Configure local paths

Apply the RetoVLA patch

Training

Evaluation

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RetoVLA

Overview

Highlights

Repository Layout

Method

Demos

Released in This Repository

Setup

Prerequisites

Configure local paths

Apply the RetoVLA patch

Training

Evaluation

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages