Real-Time Action Chunking (RTC) — Refactor Proposal #2

Basiljamal1 · 2025-10-21T16:20:52Z

Real-Time Action Chunking (RTC) — Refactor

This refactor makes Real-Time Action Chunking fully model-agnostic by replacing inheritance with composition + dependency injection.

It splits responsibilities cleanly:

RTCPolicy — policy-level wrapper (scheduling, action queue, swapping, delay buffer)
RTCPolicyFlowModel — flow-level wrapper (background thread, ΠGDM guided inpainting, queues)
Base models — unchanged (e.g., SmolVLAPolicy as the inference model, VLAFlowMatching as the flow model)

Queues & threading now live only in RTCPolicyFlowModel; neither the base policy nor the base flow model knows anything about RTC.

Why this refactor?

Model-agnostic: works with SmolVLA today, and with π0/π0.5 (or others) by swapping the flow model instance.
Cleaner layering: policy concerns vs. flow-generation concerns are separated.
No monkey-patching: base models remain untouched; RTC composes around them.
Backwards-friendly surface: policy.model still exposes the tokenizer and friends via attribute proxying, so code like
policy.model.vlm_with_expert.processor.tokenizer continues to work.

TL;DR of the architecture

Before (inheritance, cross-coupled)

RTCSmolVLAPolicy (inherits)  SmolVLAPolicy (inference model)
    └── has .model = RTCVLAFlowMatching  (inherits)  VLAFlowMatching (flow model)
         └── background thread & queues lived here

After (composition, DI)

RTCSmolVLAPolicy                 # thin shim the user instantiates
    └── owns RTCPolicy                         # policy wrapper (scheduling, queues for actions)
           ├── inference_model: SmolVLAPolicy  # preprocessing / postprocessing. Stays unchanged
           └── model: RTCPolicyFlowModel       # flow wrapper (ΠGDM + queues + thread)
                    └── flow_model: VLAFlowMatching  # UNCHANGED base flow model (no queues)

Legend:

owns = composition (has-a)
inherits = subclassing (is-a)

What each piece does

RTCPolicy (policy wrapper)

Owns inference model (e.g., SmolVLAPolicy) and RTCPolicyFlowModel.
Implements Algorithm 1 policy duties:
- delay buffer (Q), conservative (d=\max(Q))
- compute (s=\max(d,s_{\min})), start next inference exactly when (t==s)
- swap-as-soon-ready; re-index by (\delta = t-s)
- owns the action queue exposed to the environment
Builds the prepared inputs (images/state/lang) using the inference model’s preprocessors.
Post-processes chunks using the inference model (unnormalize, optional π-ALOHA).

RTCPolicyFlowModel (flow wrapper)

Owns input/output queues and a background thread.
Runs ΠGDM guided inpainting (Eqs. 1–5) against the flow model (e.g., VLAFlowMatching):
- prefix embedding + KV cache
- velocity(A, τ) calls into the flow model
- exact soft mask (W) (Eq. 5)
- guidance via VJP of (f(A)=A+(1-τ)v_\pi)
Proxies attributes to the underlying flow model so user code still sees the same public surface (tokenizer, etc.).

Base models (unchanged)

Inference model (e.g., SmolVLAPolicy): provides preprocessing, postprocessing, and holds the flow model in .model.
Flow model (e.g., VLAFlowMatching): provides embed_prefix, embed_suffix, vlm_with_expert.forward, action_out_proj, sample_actions, sample_noise.

Important: No queues live on the base models anymore.

Directory layout

lerobot/src/lerobot/policies/
├── rtc/
│   ├── __init__.py                 # exports RTCPolicy, RTCPolicyFlowModel
│   ├── model_wrapper.py            # RTCPolicyFlowModel (flow wrapper + thread + queues)
│   └── policy.py                   # RTCPolicy (policy wrapper + scheduling + action queue)
│   └── configuration_rtc_smolvla.py                   #  adds RTCSmolVLACofnig (config)
│   └── rtc_smolvla.py         # RTCSmolVLAPolicy (thin shim; relays to RTCPolicy)
│   └── configuration_pi0.py        # adds PI0ConfigRTC (future)
├── smolvla/
│   ├── modeling_smolvla.py         # UNCHANGED (base flow + smol policy)
│   ├── configuration_smolvla.py    # UNCHANGED
├── pi0/
│   ├── modeling_pi0.py             # UNCHANGED (future)
│   └── configuration_pi0.py        # UNCHANGED (future)
└── factory.py                      # register "rtc_smolvla" -> Registers pi0 in the future

feat: Added rtc to smolvla via composition

b7acf02

This was referenced Oct 21, 2025

feat: Added realtime chunking to smolvla huggingface/lerobot#2281

Open

Add Real-Time Action Chunking (RTC) for SmolVLA Policies #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real-Time Action Chunking (RTC) — Refactor Proposal #2

Real-Time Action Chunking (RTC) — Refactor Proposal #2

Uh oh!

Basiljamal1 commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Real-Time Action Chunking (RTC) — Refactor Proposal #2

Are you sure you want to change the base?

Real-Time Action Chunking (RTC) — Refactor Proposal #2

Uh oh!

Conversation

Basiljamal1 commented Oct 21, 2025

Real-Time Action Chunking (RTC) — Refactor

Why this refactor?

TL;DR of the architecture

Before (inheritance, cross-coupled)

After (composition, DI)

What each piece does

RTCPolicy (policy wrapper)

RTCPolicyFlowModel (flow wrapper)

Base models (unchanged)

Directory layout

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants