Real-Time Action Chunking (RTC) — Refactor Proposal #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Real-Time Action Chunking (RTC) — Refactor
This refactor makes Real-Time Action Chunking fully model-agnostic by replacing inheritance with composition + dependency injection.
It splits responsibilities cleanly:
SmolVLAPolicyas the inference model,VLAFlowMatchingas the flow model)Queues & threading now live only in
RTCPolicyFlowModel; neither the base policy nor the base flow model knows anything about RTC.Why this refactor?
policy.modelstill exposes the tokenizer and friends via attribute proxying, so code likepolicy.model.vlm_with_expert.processor.tokenizercontinues to work.TL;DR of the architecture
Before (inheritance, cross-coupled)
After (composition, DI)
Legend:
owns= composition (has-a)inherits= subclassing (is-a)What each piece does
RTCPolicy (policy wrapper)
Owns inference model (e.g.,
SmolVLAPolicy) and RTCPolicyFlowModel.Implements Algorithm 1 policy duties:
Builds the prepared inputs (images/state/lang) using the inference model’s preprocessors.
Post-processes chunks using the inference model (unnormalize, optional π-ALOHA).
RTCPolicyFlowModel (flow wrapper)
Owns input/output queues and a background thread.
Runs ΠGDM guided inpainting (Eqs. 1–5) against the flow model (e.g.,
VLAFlowMatching):velocity(A, τ)calls into the flow modelProxies attributes to the underlying flow model so user code still sees the same public surface (tokenizer, etc.).
Base models (unchanged)
SmolVLAPolicy): provides preprocessing, postprocessing, and holds the flow model in.model.VLAFlowMatching): providesembed_prefix,embed_suffix,vlm_with_expert.forward,action_out_proj,sample_actions,sample_noise.Directory layout