Q1 nano follows a Sim-to-Real pipeline: design the robot in CAD, simulate in MuJoCo, train walking policies with reinforcement learning, then deploy to real hardware.
┌─────────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐
│ SolidWorks │───▶│ URDF │───▶│ MuJoCo │───▶│ RL Train │───▶│ Deploy │
│ (CAD) │ │ Model │ │ Simulation│ │ (Policy) │ │ (Real HW)│
└─────────────┘ └──────────┘ └───────────┘ └──────────┘ └──────────┘
- Input: Mechanical design in SolidWorks
- Output: URDF model file for simulation
- Location:
hardware/solidworks/,hardware/step/,hardware/urdf/
The physical robot is designed in SolidWorks, exported as STEP for universal access, and converted to URDF for simulation compatibility.
- Input: URDF model
- Output: Simulated environment for training
- Location:
simulation/mujoco/,simulation/configs/
MuJoCo provides a fast, accurate physics simulation. The URDF model is loaded into MuJoCo, where we can run thousands of episodes in parallel for RL training.
- Input: MuJoCo simulation environment
- Output: Trained walking policy (neural network weights)
- Location:
training/rl/,training/checkpoints/,training/scripts/
We use reinforcement learning (e.g., PPO) to train a walking policy. The agent learns to control joint positions to produce stable, natural-looking gait.
Key Training Details:
- Observation space: joint positions, velocities, body orientation (IMU)
- Action space: target joint positions for all servos
- Reward: forward velocity + stability + energy efficiency + style penalties
- Input: Trained policy checkpoint
- Output: Walking robot 🤖
- Location:
firmware/servo_control/,firmware/main_controller/
The trained policy is deployed to the real robot's microcontroller. The controller reads sensor data and outputs servo commands at ~50 Hz.
Sim-to-Real Challenges:
- Domain randomization during training for robustness
- Servo response delay compensation
- Real-world surface friction variation
┌─────────────────┐
│ Main Controller │
│ (ESP32/STM32) │
└────────┬────────┘
│ Serial Bus
┌──────────────┼──────────────┐
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Servo 1 │ │ Servo 2 │ │ Servo N │
│ (Hip L) │ │ (Hip R) │ │ (...) │
└─────────┘ └─────────┘ └─────────┘
- 12 DOF (Degrees of Freedom): 6 per leg (hip yaw/roll/pitch, knee pitch, ankle pitch/roll)
- Serial bus servos: All servos on a single bus, addressed by ID
- Control loop: ~50 Hz policy inference → servo command