🏆 3rd Place Winner at 2025 Seeed × NVIDIA × LeRobot Hackathon
Click the GIF to watch the full demonstration on Bilibili
This repository is forked from huggingface/lerobot and based on commit f55c6e8 (Dataset v3).
Added support for bimanual SO-101 robot data collection, including:
-
BiSO101Follower: Dual-arm follower robot implementation
- Manages left and right SO-101 follower arms independently
- Unified interface with automatic prefix handling (
left_*,right_*) - Synchronized observation and action control
-
BiSO101Leader: Dual-arm leader teleoperator
- Teleoperation control for bimanual manipulation
- Feedback support for both arms
-
Seamless integration with existing LeRobot recording and replay pipeline
Hardware Configuration:
- Cameras (3 total):
front_cam: Front-facing camera for global scene viewhand_cam: Wrist-mounted camera for close-up manipulation viewside_cam: Side-view camera for scene observation
Implementation Details:
- Reuses existing
SO101FollowerandSO101Leaderimplementations through composition - Added factory methods in
robots/utils.pyandteleoperators/utils.py - Configuration support for independent arm settings (ports, torque, calibration)
Action Dimension Handling (SmolVLA Built-in Feature):
Note: The following is a built-in capability of SmolVLA, not a modification made in this fork.
The bimanual SO-101 has 12 action dimensions (6 joints × 2 arms), which SmolVLA automatically handles without manual configuration:
# SmolVLA automatically detects action dimensions from dataset
# Training: 12-dim → pad to 32-dim (max_action_dim)
actions = pad_vector(batch[ACTION], self.config.max_action_dim)
# Inference: 32-dim → trim back to 12-dim
original_action_dim = self.config.action_feature.shape[0] # auto-detected: 12
actions = actions[:, :, :original_action_dim]Unlike other VLA models (e.g., xVLA) that require manual action_mode configuration, SmolVLA's dynamic padding system supports any action dimension ≤ 32 without code changes.
This fork uses the LeRobot Dataset v3 format.
- Platform: x86/x64 (Intel/AMD)
- OS: Ubuntu 20.04 or later
- Python: 3.10
- Hardware: SO-101 bimanual robot arms
# Download Miniconda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Run the installer
bash Miniconda3-latest-Linux-x86_64.sh
# Restart your terminal after installation, then verify
conda --version# Create a new conda environment named 'lerobot' with Python 3.10
conda create -n lerobot python=3.10
# Activate the environment
conda activate lerobotNote: You'll need to activate this environment every time you work with LeRobot:
conda activate lerobotInstall ffmpeg for video encoding/decoding:
# Using conda (recommended)
conda install -c conda-forge ffmpeg
# Or using apt (alternative)
# sudo apt-get update && sudo apt-get install ffmpeggit clone https://github.com/kahowang/lerobot.git
cd lerobotInstall LeRobot with Feetech motor support:
# Install with feetech motor support (required for SO-101 robots)
pip install -e ".[feetech]"Install SmolVLA for training and inference:
# Install SmolVLA dependencies
pip install -e ".[smolvla]"This will install:
transformersfor the vision-language-action modelnum2wordsfor natural language processingacceleratefor distributed trainingsafetensorsfor model serialization
Reference Documentation: For detailed setup instructions and troubleshooting, please refer to the Seeed Studio LeRobot Wiki.
Record demonstrations using the bimanual SO-101 robot with three camera views (front_cam, hand_cam, side_cam):
lerobot-record \
--robot.type=bi_so101_follower \
--robot.left_arm_port=/dev/ttyACM0 \
--robot.right_arm_port=/dev/ttyACM1 \
--robot.id=bimanual_follower \
--robot.cameras='{
"front_cam": {"type": "opencv", "index_or_path": 0, "width": 640, "height": 480, "fps": 30},
"hand_cam": {"type": "opencv", "index_or_path": 1, "width": 640, "height": 480, "fps": 30},
"side_cam": {"type": "opencv", "index_or_path": 2, "width": 640, "height": 480, "fps": 30}
}' \
--teleop.type=bi_so101_leader \
--teleop.left_arm_port=/dev/ttyACM2 \
--teleop.right_arm_port=/dev/ttyACM3 \
--teleop.id=bimanual_leader \
--dataset.repo_id=${HF_USER}/your_dataset_name \
--dataset.single_task="Your task description here" \
--dataset.num_episodes=50Notes:
- Replace port values (
/dev/ttyACM0,/dev/ttyACM1, etc.) with your actual device ports - Replace
index_or_pathvalues (0, 1, 2) with your actual camera indices or paths - Use
lerobot-find-portto discover connected device ports - Adjust camera parameters (width, height, fps) based on your hardware
- We recommend recording at least 50 episodes for optimal SmolVLA performance
- Use
--dataset.single_taskto describe your task in natural language
Fine-tune the SmolVLA model on your collected dataset:
lerobot-train \
--policy.path=lerobot/smolvla_base \
--dataset.repo_id=${HF_USER}/your_dataset_name \
--batch_size=64 \
--steps=20000 \
--output_dir=outputs/train/smolvla_three_cameras \
--job_name=smolvla_training_three_cameras \
--policy.device=cuda \
--wandb.enable=trueTraining Notes:
- Training for 20k steps takes ~4 hours on a single A100 GPU
- Adjust
--batch_sizebased on your GPU memory - Use
--wandb.enable=trueto track training progress with Weights & Biases - The model will automatically use all three camera views from your dataset
- Fine-tune
--stepsbased on validation performance
Run inference using your trained SmolVLA model with three cameras:
lerobot-record \
--robot.type=bi_so101_follower \
--robot.left_arm_port=/dev/ttyACM0 \
--robot.right_arm_port=/dev/ttyACM1 \
--robot.id=bimanual_follower \
--robot.cameras='{
"front_cam": {"type": "opencv", "index_or_path": 0, "width": 640, "height": 480, "fps": 30},
"hand_cam": {"type": "opencv", "index_or_path": 1, "width": 640, "height": 480, "fps": 30},
"side_cam": {"type": "opencv", "index_or_path": 2, "width": 640, "height": 480, "fps": 30}
}' \
--dataset.single_task="Your task description here" \
--dataset.repo_id=${HF_USER}/eval_your_dataset_name \
--dataset.episode_time_s=50 \
--dataset.num_episodes=10 \
--policy.path=${HF_USER}/smolvla_three_camerasInference Notes:
- Replace port values (
/dev/ttyACM0,/dev/ttyACM1) with your actual follower arm ports - Use the same camera configuration as during data collection
- Use the same task description as in your training dataset
- The policy will control the robot autonomously based on camera observations
- The evaluation results will be saved to
${HF_USER}/eval_your_dataset_name - Adjust
--dataset.num_episodesfor your evaluation needs
To replay and visualize collected episodes:
lerobot-replay \
--robot.type=bi_so101_follower \
--robot.left_arm_port=/dev/ttyACM0 \
--robot.right_arm_port=/dev/ttyACM1 \
--robot.id=bimanual_follower \
--dataset.repo_id=${HF_USER}/your_dataset_name \
--dataset.episode=0Replay Notes:
- Replace port values with your actual follower arm ports
- The robot will replay the recorded actions from the specified episode
- Use
--dataset.episodeto select which episode to replay (0-indexed)
For VR-based robot control, we also developed a ROS2 package that enables intuitive teleoperation of the SO-ARM101 robotic arms through VR controllers.
Features:
- Real-time VR controller to robot end-effector mapping using inverse kinematics
- Support for single arm (left/right) or simultaneous dual arm operation
- Integrated mobile base control through VR joystick inputs
- VR trigger-based gripper control with dynamic calibration
For detailed setup and usage instructions, visit the VR controller repository:
VR Controller Repository: lerobot_vr_controller
For full documentation, tutorials, and more information, please visit:
- Original Repository: huggingface/lerobot
- Documentation: https://huggingface.co/docs/lerobot
- Community: https://huggingface.co/lerobot
If you use this work, please cite:
Contributors: kahowang (王家浩) • bubblepan (潘春波) • Makermods
@misc{wang2025smolvla_xlerobot,
author = {Wang, Jiahao and Pan, Chunbo and Makermods},
title = {SmolVLA for XLeRobot: Bimanual SO-101 Robot Control with Vision-Language-Action Model},
howpublished = "\url{https://github.com/kahowang/lerobot}",
year = {2025},
note = {3rd Place Winner at 2025 Seeed × NVIDIA × LeRobot Hackathon}
}@misc{cadene2024lerobot,
author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascal, Caroline and Choghari, Jade and Moss, Jess and Wolf, Thomas},
title = {LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
howpublished = "\url{https://github.com/huggingface/lerobot}",
year = {2024}
}








