SO-101 VLA Development Tutorial

This tutorial aims to guide users from scratch in building a robot control system, which is based on both our Dexbotic toolbox and the LeRobot framework, completing the full process deployment from data processing to real-world deployment.

Preparation

Before starting, please ensure the hardware connection is normal and basic calibration is completed.

Hardware & Basic Environment

Refer to the SO-101 Official Tutorial to complete the following in order:

Servo Calibration
Robot Arm Assembly
Robot Arm Calibration
Teleoperation Test
Camera Installation
Dataset Collection (Recording using LeRobot scripts)

Data Conversion

To adapt to policy training, we need to convert the raw data collected by LeRobot into our generic DexData format.

Conversion Logic

Camera Mapping: Map physical cameras (e.g., Front/Side) to logical views (Head/Wrist).
Time Alignment: Align video frames, robot states (State), actions (Action), and language instructions (Prompt) on a unified timeline.

Data Directory Standard

Please ensure the input data structure conforms to the LeRobot standard:

my_input_dataset/                  # Input root directory
├── insert_ring/                   # Task name
│   └── train/                     # Split (train/test/val)
│       ├── meta/tasks.parquet     # Task metadata
│       ├── data/chunk-000/        # State/action data (.parquet)
│       └── videos/                # Video data

Execution

Run the command:

python hardware/so101/convert_so101_to_dexdata.py \
  --dataset_path /path/to/lerobot_dataset/press_blue_then_green \
  --output_dir /path/to/so101_dexdata \
  --task_prompt "Pick up the object" \
  --task_name push_button

The output structure should look like this:

so101_dexdata/
├── dexdata_jsonl/
│   └── push_button/episode_00000.jsonl
└── videos/
    └── push_button/episode_00000_front.mp4

Policy Training

This section follows the official Dexbotic training paradigm.

Register the dataset

Create a dataset registration file under dexbotic/data/data_source:

from dexbotic.data.data_source.register import register_dataset

SO101_DATASET = {
    "push_button": {
        "data_path_prefix": "./dexbotic/so101_dexdata/videos/push_button",
        "annotations": "./so101_dexdata/dexdata_jsonl/push_button",
        "frequency": 1,
    },
}

meta_data = {
    "non_delta_mask": [-1],
    "periodic_mask": None,
    "periodic_range": None,
}

register_dataset(SO101_DATASET, meta_data=meta_data, prefix="so101")

This makes the dataset name so101_push_button.

Create an experiment config

Modify playground/example_exp.py and update:

CogActDataConfig.dataset_name = "so101_push_button"
CogActTrainerConfig.output_dir to your checkpoint directory
CogActModelConfig.model_name_or_path to the pretrained model path

Start training

cd /path/to/dexbotic
deepspeed playground/example_exp.py --task train

System Launch

This section starts three processes:

VLA policy server: Runs the trained model and exposes an inference endpoint.
Bridge Server: Receives robot images, forwards requests to the VLA policy, and returns actions.
Robot Client: Runs on the robot side, streams observations to the Bridge, and executes actions.

Before you start, put those scripts into your SO101 working directory (replace paths as needed):

cp hardware/so101/bridge_server.py ~/path/to/SO101/
cp hardware/so101/client.py ~/path/to/SO101/

Start VLA policy server (Terminal 1)

Use the same training script to launch your policy server:

cd /path/to/dexbotic
python  playground/example_exp.py --task inference

Start Bridge Server (Terminal 2)

Once started, this service waits for the robot connection and is responsible for displaying the video feed.
Note: The --task parameter must match the training Prompt.

# Enter working directory
cd ~/path/to/SO101
conda activate lerobot 

# Please modify --vla_url (VLA policy URL) according to your actual situation
python bridge_server.py \
  --vla_url http://your_ip:7899 \
  --prompt "Press the button"

Parameter Explanation:

--vla_url: Specifies the API endpoint of the VLA policy; please replace the example IP (your_ip) with your actual backend server address.
--prompt: The text prompt sent to the model, which must strictly match the instruction used during training (including punctuation) to ensure correct inference behavior.

Success Indicators:

If your configuration is correct, these logs will be shown in the Terminal 2:

Bridge Server started on [::]:8080
Waiting for Dual-Camera Robot Client...

Start Robot Client (Terminal 3)

This service drives the hardware, sends images to the Bridge, and executes received actions.
Permission Hint: It is recommended to run sudo chmod 666 /dev/ttyACM0 every time USB is plugged in.

# Start robot client
python -m lerobot.async_inference.robot_client \
  --robot.type=so100_follower \
  --robot.port=/dev/ttyACM0 \
  --robot.cameras="{ front: {type: opencv, index_or_path: 6, width: 640, height: 480, fps: 30, fourcc: 'MJPG'}, side: {type: opencv, index_or_path: 12, width: 640, height: 480, fps: 30, fourcc: 'MJPG'}}" \
  --server_address=127.0.0.1:8080 \
  --actions_per_chunk=32 \
  --chunk_size_threshold=0.5 \
  --aggregate_fn_name=weighted_average \
  --task="Press the button" \
  --policy_type=act \
  --policy_device=mps

Key Parameter Explanation:

--actions_per_chunk=32: Defines the total number of actions received by the client in a single inference.
--server_address: If the Bridge Server is on the local machine, enter 127.0.0.1:8080; if on another computer, enter that computer's IP.

Successful Run Indicators

Terminal 2: Displays Robot Client Connected, and continuously logs Sending ... images to VLA.
Terminal 3: Displays Robot connected and ready.
Robot Arm: Begins to move smoothly following the instructions.

Demos

Following the above instructions, we conducted a Push Button task on our SO-101 robot, and provide a successful demo video below:

📹 View Demo Video: Push Button Task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SO-101 VLA Development Tutorial

Preparation

Hardware & Basic Environment

Data Conversion

Conversion Logic

Data Directory Standard

Execution

Policy Training

Register the dataset

Create an experiment config

Start training

System Launch

Start VLA policy server (Terminal 1)

Start Bridge Server (Terminal 2)

Start Robot Client (Terminal 3)

Successful Run Indicators

Demos

FilesExpand file tree

so101_inference_example.md

Latest commit

History

so101_inference_example.md

File metadata and controls

SO-101 VLA Development Tutorial

Preparation

Hardware & Basic Environment

Data Conversion

Conversion Logic

Data Directory Standard

Execution

Policy Training

Register the dataset

Create an experiment config

Start training

System Launch

Start VLA policy server (Terminal 1)

Start Bridge Server (Terminal 2)

Start Robot Client (Terminal 3)

Successful Run Indicators

Demos