This tutorial aims to guide users from scratch in building a robot control system, which is based on both our Dexbotic toolbox and the LeRobot framework, completing the full process deployment from data processing to real-world deployment.
Before starting, please ensure the hardware connection is normal and basic calibration is completed.
Refer to the SO-101 Official Tutorial to complete the following in order:
- Servo Calibration
- Robot Arm Assembly
- Robot Arm Calibration
- Teleoperation Test
- Camera Installation
- Dataset Collection (Recording using LeRobot scripts)
To adapt to policy training, we need to convert the raw data collected by LeRobot into our generic DexData format.
- Camera Mapping: Map physical cameras (e.g., Front/Side) to logical views (Head/Wrist).
- Time Alignment: Align video frames, robot states (State), actions (Action), and language instructions (Prompt) on a unified timeline.
Please ensure the input data structure conforms to the LeRobot standard:
my_input_dataset/ # Input root directory
├── insert_ring/ # Task name
│ └── train/ # Split (train/test/val)
│ ├── meta/tasks.parquet # Task metadata
│ ├── data/chunk-000/ # State/action data (.parquet)
│ └── videos/ # Video dataRun the command:
python hardware/so101/convert_so101_to_dexdata.py \
--dataset_path /path/to/lerobot_dataset/press_blue_then_green \
--output_dir /path/to/so101_dexdata \
--task_prompt "Pick up the object" \
--task_name push_buttonThe output structure should look like this:
so101_dexdata/
├── dexdata_jsonl/
│ └── push_button/episode_00000.jsonl
└── videos/
└── push_button/episode_00000_front.mp4
This section follows the official Dexbotic training paradigm.
Create a dataset registration file under dexbotic/data/data_source:
from dexbotic.data.data_source.register import register_dataset
SO101_DATASET = {
"push_button": {
"data_path_prefix": "./dexbotic/so101_dexdata/videos/push_button",
"annotations": "./so101_dexdata/dexdata_jsonl/push_button",
"frequency": 1,
},
}
meta_data = {
"non_delta_mask": [-1],
"periodic_mask": None,
"periodic_range": None,
}
register_dataset(SO101_DATASET, meta_data=meta_data, prefix="so101")This makes the dataset name so101_push_button.
Modify playground/example_exp.py and update:
CogActDataConfig.dataset_name = "so101_push_button"CogActTrainerConfig.output_dirto your checkpoint directoryCogActModelConfig.model_name_or_pathto the pretrained model path
cd /path/to/dexbotic
deepspeed playground/example_exp.py --task trainThis section starts three processes:
- VLA policy server: Runs the trained model and exposes an inference endpoint.
- Bridge Server: Receives robot images, forwards requests to the VLA policy, and returns actions.
- Robot Client: Runs on the robot side, streams observations to the Bridge, and executes actions.
Before you start, put those scripts into your SO101 working directory (replace paths as needed):
cp hardware/so101/bridge_server.py ~/path/to/SO101/
cp hardware/so101/client.py ~/path/to/SO101/Use the same training script to launch your policy server:
cd /path/to/dexbotic
python playground/example_exp.py --task inferenceOnce started, this service waits for the robot connection and is responsible for displaying the video feed.
Note: The --task parameter must match the training Prompt.
# Enter working directory
cd ~/path/to/SO101
conda activate lerobot
# Please modify --vla_url (VLA policy URL) according to your actual situation
python bridge_server.py \
--vla_url http://your_ip:7899 \
--prompt "Press the button"Parameter Explanation:
--vla_url: Specifies the API endpoint of the VLA policy; please replace the example IP (your_ip) with your actual backend server address.--prompt: The text prompt sent to the model, which must strictly match the instruction used during training (including punctuation) to ensure correct inference behavior.
Success Indicators:
If your configuration is correct, these logs will be shown in the Terminal 2:
Bridge Server started on [::]:8080
Waiting for Dual-Camera Robot Client...
This service drives the hardware, sends images to the Bridge, and executes received actions.
Permission Hint: It is recommended to run sudo chmod 666 /dev/ttyACM0 every time USB is plugged in.
# Start robot client
python -m lerobot.async_inference.robot_client \
--robot.type=so100_follower \
--robot.port=/dev/ttyACM0 \
--robot.cameras="{ front: {type: opencv, index_or_path: 6, width: 640, height: 480, fps: 30, fourcc: 'MJPG'}, side: {type: opencv, index_or_path: 12, width: 640, height: 480, fps: 30, fourcc: 'MJPG'}}" \
--server_address=127.0.0.1:8080 \
--actions_per_chunk=32 \
--chunk_size_threshold=0.5 \
--aggregate_fn_name=weighted_average \
--task="Press the button" \
--policy_type=act \
--policy_device=mpsKey Parameter Explanation:
--actions_per_chunk=32: Defines the total number of actions received by the client in a single inference.--server_address: If the Bridge Server is on the local machine, enter127.0.0.1:8080; if on another computer, enter that computer's IP.
- Terminal 2: Displays
Robot Client Connected, and continuously logsSending ... images to VLA. - Terminal 3: Displays
Robot connected and ready. - Robot Arm: Begins to move smoothly following the instructions.
Following the above instructions, we conducted a Push Button task on our SO-101 robot, and provide a successful demo video below: