Vision-triggered robot runtime for LeRobot SO-ARM101 / SO101 follower arm with YOLO segmentation trigger + GRU readiness trigger + LeRobot pretrained policies (ACT).
Core idea
- One shared vision trigger camera (YOLO + GRU) decides when to switch stages
- Separate policy observation cameras (LeRobot
robot.cameras) feed multi-view observations to policies (global/left/right/wrist, etc.)- A single runtime orchestrator (
Panbot/control/main_runtime.py) drives the entire pipeline end-to-end
Watch the full runtime demo on YouTube:
👉 Panbot Demo – Vision-Triggered Runtime (YOLO + GRU + ACT)
- What this project does
- System architecture
- Vision modules (external repo)
- Datasets & pretrained ACT models
- Folder structure
- Main entry
- Configuration
- Quick start
- Runtime stages
- Logs
- Troubleshooting
- Safety notes
- Collaborators
Panbot/control/main_runtime.py runs the following sequence:
- Connect robot (SO101 follower arm)
- Open vision camera (single shared camera used by YOLO + GRU triggers)
- Run Task 1 motion, continuously checking the YOLO trigger
- When the YOLO trigger fires, the runtime switches Task 1 into a return sequence
- After Task 1 return completes, wait for GRU trigger
- When the GRU trigger fires, run Policy 1 (Task 2)
- Wait for a configured duration, then run Policy 2 (Task 3)
- Finally, return to base pose + clean/safe shutdown
vision.cam_index
✅ One shared camera for YOLO/GRU triggers (stage switching)robot.cameras
✅ Separate set of policy observation cameras for LeRobot policies (multi-view)
(A) Trigger pipeline (YOLO + GRU)
- Input:
vision.cam_indexcamera stream - Optional: perspective warp using
corners.json - Output: trigger booleans + visualization frames + debug info
(B) Policy pipeline (LeRobot)
- Input:
robot.cameras.*(right/left/global/wrist, etc.) - Used only during policy execution stages
- Output: actions sent to the robot at
policy_fps
- YOLO trigger → signals “Task 1 is done / batter coverage reached” → switch Task 1 into return mode
- GRU trigger → signals “cooking readiness reached” → start Task 2 policy (and later Task 3)
This runtime repository focuses on orchestrating robot control + triggers + policies.
For the vision pipeline implementation details—dataset generation utilities, training code, model/export specifics, and trigger inference modules—please refer to:
- Panbot_vision: https://github.com/ispaik06/Panbot_vision
If you modify vision inference behavior (warp, preprocessing, label mapping, etc.), keep the runtime config (
runtime.yaml) aligned with the vision repo’s expected preprocessing and checkpoint formats.
The runtime expects policy repo IDs in runtime.yaml (e.g., policies.policy1.repo_id, policies.policy2.repo_id).
Public datasets and pretrained ACT models used for Panbot are available here:
- Task 2 dataset: https://huggingface.co/datasets/ispaik06/Panbot_task2_dataset_3
- Task 3 dataset: https://huggingface.co/datasets/ispaik06/Panbot_task3_dataset_3
- ACT Task 2: https://huggingface.co/ispaik06/act_panbot_task2_3
- ACT Task 3: https://huggingface.co/ispaik06/act_panbot_task3_3
Panbot/
├─ config/
│ └─ runtime.yaml # Runtime config (camera/trigger/tasks/policies/poses/logging)
│
├─ control/
│ └─ main_runtime.py # ✅ Main runtime orchestrator (end-to-end pipeline)
│
├─ vision/
│ ├─ calibration/
│ │ └─ corners.json # 4-point corners for perspective warp (optional)
│ │
│ ├─ models/
│ │ └─ runs/
│ │ ├─ batter_seg_local_v1/weights/best.pt # YOLO segmentation model
│ │ └─ resnet18_gru16_cls/best.pt # GRU checkpoint
│ │
│ └─ modules/
│ ├─ camera.py # open_camera(), resize_for_preview()
│ ├─ yoloseg_infer.py # YOLOSegConfig, YOLOSegInfer (trigger + visualization)
│ └─ gru_infer.py # GRUInferConfig, GRUInfer (trigger + visualization)
│
├─ tasks/
│ ├─ base_pose.py # BasePoseController, HoldConfig
│ └─ task1_motion.py # Task1MotionConfig/Stepper, DEFAULT_REST_ACTION
│
├─ policies/
│ └─ common_policy_runner.py # run_pretrained_policy_shared_robot()
│
└─ logs/
└─ (runtime logs output here) # Controlled by runtime.yaml -> log.dirPaths may vary slightly depending on your local project state, but this README reflects the import paths used by
main_runtime.py.
Below is what the runtime imports and how each component is used.
- Config file:
Panbot/config/runtime.yaml - Logging:
- Reads
log.dir,log.level - Logs to both stdout and file
- Reads
Related helpers:
_load_yaml(),_normalize_runtime_config()_setup_logging(log_dir, level)
Robot creation:
SO101FollowerConfigfrom LeRobotmake_robot_from_config
Runtime config keys:
robot.port,robot.id,robot.calibration_dirrobot.cameras.*(policy observation cameras)
Robot config builder:
_build_so101_config(cfg["robot"])
Vision camera open:
from Panbot.vision.modules.camera import open_camera, resize_for_preview
Runtime config keys:
vision.cam_index,vision.backend,vision.mjpgvision.width,vision.height,vision.fps
Example:
cap = open_camera(
cam_index=cam_index,
backend=backend,
mjpg=mjpg,
width=width,
height=height,
fps=fps,
)File:
Panbot/vision/modules/yoloseg_infer.py
Classes:
YOLOSegConfig,YOLOSegInfer
Runtime config keys:
yolo_trigger.conf,yolo_trigger.imgszyolo_trigger.area_thr_ratio,yolo_trigger.hold_framesyolo_trigger.use_warp,yolo_trigger.warp_w,yolo_trigger.warp_h- Warp corners file:
paths.corners
Runtime behavior:
- Call
yolo.step(frame)→ returns(triggered, vis_frame, info) - On first
triggered=True, Task 1 switches into return mode
File:
Panbot/vision/modules/gru_infer.py
Classes:
GRUInferConfig,GRUInfer
Runtime config keys:
gru_trigger.image_size,gru_trigger.seq_len,gru_trigger.stridegru_trigger.ema,gru_trigger.ready_hold,gru_trigger.ampgru_trigger.use_warp,gru_trigger.warp_w,gru_trigger.warp_h
Runtime behavior:
- After Task 1 return completes, call
gru.reset() - Call
gru.step(frame)→ returns(triggered, vis_frame, info) - When
triggered=True, start policy stage(s)
File:
Panbot/tasks/task1_motion.py
Classes:
Task1MotionConfig,Task1MotionStepper
Related:
DEFAULT_REST_ACTION
Runtime config keys:
task.task1_ramp_time_stask.task1_pose_hold_sposes.task1_initial_sequenceposes.task1_return_sequence
Key calls:
task1.start_initial()- loop:
task1.step(time.perf_counter()) - on YOLO trigger:
task1.interrupt_to_return()
File:
Panbot/tasks/base_pose.py
Classes:
BasePoseController,HoldConfig
Runtime config keys:
poses.base_posetask.base_pose_hold_interval_s
Purpose:
- Keeps the robot stable in base pose between stages
- Uses periodic ticks (
base_ctrl.tick()) to hold position safely
File:
Panbot/policies/common_policy_runner.py
Function:
run_pretrained_policy_shared_robot(...)
Runtime config keys:
task.policy_fpstask.task2_duration_s(Policy 1 duration)task.task3_duration_s(Policy 2 duration)task.wait_task2_to_task3_spolicies.policy1.repo_idpolicies.policy2.repo_idpolicies.*.use_amp,print_joints,print_joints_every, etc.
Robot actuation happens inside:
robot.send_action(...)
paths:
corners: "Panbot/vision/calibration/corners.json"
yolo_model: "Panbot/vision/models/runs/.../best.pt"
gru_ckpt: "Panbot/vision/models/runs/.../best.pt"vision:
cam_index: 0
backend: "v4l2"
mjpg: true
width: 3840
height: 2160
fps: 30
show: true
yolo_preview_scale: 0.55
gru_preview_scale: 0.30
watchdog_s: 2.0Notes:
show: truetypically enables local preview windows for debugging.watchdog_scan be used to detect stalled camera capture (implementation-dependent).
robot:
port: "/dev/ttyACM0"
id: "my_awesome_follower_arm"
cameras:
right: { type: "opencv", index_or_path: 2, width: 640, height: 480, fps: 30, fourcc: "MJPG" }
# left/global/wrist/... as neededyolo_trigger:
conf: 0.25
imgsz: 640
area_thr_ratio: 0.17
hold_frames: 30
use_warp: true
warp_w: 0
warp_h: 0gru_trigger:
image_size: 224
seq_len: 16
stride: 6
ema: 0.7
ready_hold: 3
amp: true
use_warp: true
warp_w: 0
warp_h: 0task:
hz: 30
task1_ramp_time_s: 3.0
task1_pose_hold_s: 1.0
base_pose_hold_interval_s: 0.25
policy_fps: 30
task2_duration_s: 10.0
task3_duration_s: 10.0
wait_task2_to_task3_s: 30.0You need:
- Python 3.x
- LeRobot
- OpenCV
- PyTorch (CUDA recommended)
- YOLO inference dependency (e.g., Ultralytics) if your YOLO module expects it
Example (conceptual):
conda create -n panbot python=3.10 -y
conda activate panbot
pip install opencv-python torch torchvision
pip install lerobot
pip install ultralyticsCUDA installation depends on your GPU/driver setup. Make sure
torch.cuda.is_available()is True if you want GPU acceleration.
vision.cam_indexmust point to the trigger camerarobot.cameras.*.index_or_pathmust point to policy observation cameras
cd ~/Panbot
chmod +x scripts/start_all.sh
./scripts/start_all.shIf you don’t use
start_all.sh, you can run the runtime module directly (depends on your package layout):
python -m Panbot.control.main_runtime-
Stage 1: Task 1 initial + YOLO trigger
task1.step()moves the robotyolo.step(frame)checks the trigger condition- On trigger → switch into Task 1 return sequence
-
Stage 2: Base pose hold + GRU trigger
base_ctrl.tick()holds stable base posegru.step(frame)checks readiness trigger
-
Stage 3: Policy 1 (Task 2)
- policy runner drives
robot.send_action(...)
- policy runner drives
-
Wait
-
Stage 4: Policy 2 (Task 3)
-
Finalize
- Return to base pose
- Release camera resources
- Safe shutdown
- Location:
log.dir(default:Panbot/logs) - Filename example:
main_runtime_YYYYMMDD_HHMMSS.log
- Verify
vision.cam_index - Try different
vision.backendvalues (v4l2,opencv, etc.) - Toggle
vision.mjpg: true/falsedepending on your camera
Tune:
yolo_trigger.confyolo_trigger.area_thr_ratioyolo_trigger.hold_frames
Tune:
gru_trigger.seq_len,gru_trigger.stridegru_trigger.emagru_trigger.ready_hold
Also confirm:
- The GRU checkpoint path is correct (
paths.gru_ckpt) - Your preprocessing (warp/resize/normalization) matches training
Check:
policies.policy1.repo_id/policies.policy2.repo_id- Policy runner logs (does it generate actions?)
- Whether
robot.send_action(...)is reached - Whether
robot.camerasare streaming correctly (policy observation inputs)
- Always test with low speed / safe workspace first.
- Keep an emergency stop method available (power cutoff or software kill).
- Make sure your base pose is mechanically safe and collision-free.
- Do not run unattended until the full pipeline is validated.