A unified robot learning framework powered by LuckyEngine
LuckyLab is a modular, config-driven framework that brings reinforcement learning, imitation learning, and real-time visualization together in one place. It communicates with LuckyEngine through luckyrobots and runs on both CPU and GPU.
The framework ships with locomotion and manipulation tasks but is easily extensible to any robot or task. It supports all imitation learning algorithms in LeRobot and multiple RL algorithms via skrl and Stable Baselines3. Live inspection is available through Rerun and Viser.
| Robot | Task | Learning |
|---|---|---|
| Unitree Go2 | Velocity tracking | RL (PPO, SAC, TD3, DDPG) |
| Piper | Pick-and-place | IL (via LeRobot) |
- Python 3.10+
- LuckyEngine executable
- luckyrobots >= 0.1.81
- PyTorch >= 2.0
git clone https://github.com/luckyrobots/luckylab.git
cd luckylab
# Core + RL
uv sync --group rl
# Core + IL (LeRobot)
uv sync --group il
# Everything (RL + IL + Rerun + dev tools)
uv sync --all-groups# RL — train SAC on the Go2
python -m luckylab.scripts.train go2_velocity_flat \
--agent.algorithm sac --agent.backend skrl --device cuda
# IL — train ACT on a local dataset
python -m luckylab.scripts.train piper_pickandplace \
--il.policy act --il.dataset-repo-id piper/pickandplace --device cuda# RL — with keyboard control
python -m luckylab.scripts.play go2_velocity_flat \
--algorithm sac --checkpoint runs/go2_velocity_sac/checkpoints/best_agent.pt \
--keyboard
# IL
python -m luckylab.scripts.play piper_pickandplace \
--policy act --checkpoint runs/luckylab_il/finalKeyboard controls: W/S forward/back, A/D strafe, Q/E turn, Space zero, Esc quit.
# Browse a dataset in Rerun (opens in browser)
python -m luckylab.scripts.visualize_dataset \
--repo-id piper/pickandplace --episode-index 0 --web
# List all registered tasks
python -m luckylab.scripts.list_envsFour algorithms across two backends, all configurable via CLI or Python:
| Algorithm | Type | Backends |
|---|---|---|
| PPO | On-policy | skrl, sb3 |
| SAC | Off-policy | skrl, sb3 |
| TD3 | Off-policy | skrl, sb3 |
| DDPG | Off-policy | skrl, sb3 |
python -m luckylab.scripts.train go2_velocity_flat \
--agent.algorithm sac --agent.backend skrl \
--agent.max-iterations 5000 \
--env.num-envs 4096 \
--device cudafrom luckylab.rl import train, RlRunnerCfg
from luckylab.tasks import load_env_cfg
env_cfg = load_env_cfg("go2_velocity_flat")
rl_cfg = RlRunnerCfg(algorithm="sac", backend="skrl", max_iterations=5000)
train(env_cfg=env_cfg, rl_cfg=rl_cfg, device="cuda")Note: LuckyEngine does not currently support environment parallelization, so on-policy algorithms like PPO that depend on large batch collection are not recommended. Off-policy algorithms like SAC are the best fit for now. Parallelization support is actively being worked on.
Backend recommendation: Stable Baselines3 is not designed for GPU training. If you want to train on GPU, use the skrl backend (
--agent.backend skrl).
LuckyLab integrates with LeRobot for imitation learning. ACT and Diffusion Policy are ready to use out of the box. Other LeRobot policies (Pi0, SmolVLA, etc.) are supported but require registering a task config for them first, similar to how the ACT and Diffusion configs are set up.
python -m luckylab.scripts.train piper_pickandplace \
--il.policy act \
--il.dataset-repo-id piper/pickandplace \
--il.batch-size 8 \
--il.num-train-steps 100000 \
--device cudaDatasets are loaded from the HuggingFace Hub or from a local directory at ~/.luckyrobots/data/ (configurable via LUCKYROBOTS_DATA_HOME).
Tasks bundle an environment config with RL and/or IL configs. The registry makes it easy to add new ones:
from luckylab.tasks import register_task
from luckylab.envs import ManagerBasedRlEnvCfg
from luckylab.rl import RlRunnerCfg
env_cfg = ManagerBasedRlEnvCfg(
decimation=4,
robot="unitreego2",
scene="velocity",
observations={...},
actions={...},
rewards={...},
terminations={...},
)
register_task(
"my_task",
env_cfg,
rl_cfgs={"ppo": RlRunnerCfg(algorithm="ppo", max_iterations=3000)},
)LuckyLab uses a manager-based environment where each MDP component is handled by a dedicated manager, configured with direct function references:
ManagerBasedRlEnv
├── ObservationManager Observation groups with noise, delay, and history
├── ActionManager Action scaling, offset, and joint commands
├── RewardManager Weighted sum of reward terms
├── TerminationManager Episode termination conditions
└── CurriculumManager Progressive difficulty adjustment
from luckylab.managers import RewardTermCfg, TerminationTermCfg
from luckylab.tasks.velocity import mdp
rewards = {
"track_velocity": RewardTermCfg(func=mdp.track_linear_velocity, weight=2.0, params={"std": 0.5}),
"action_rate": RewardTermCfg(func=mdp.action_rate_l2, weight=-0.1),
}
terminations = {
"time_out": TerminationTermCfg(func=mdp.time_out, time_out=True),
"fell_over": TerminationTermCfg(func=mdp.bad_orientation, params={"limit_angle": 1.2}),
}Policy Viewer — a web-based MuJoCo viewer powered by Viser for inspecting trained RL policies. Renders the robot in a browser with velocity command sliders, pause/play, speed control, and keyboard input — no LuckyEngine connection required.
# Open http://localhost:8080 after starting
python -m luckylab.viewer.run_policy runs/go2_velocity_sac/checkpoints/best_agent.ptRerun — live step-by-step inspection of observations, actions, rewards, and camera feeds. No LuckyEngine connection required.
# Dataset viewer
python -m luckylab.scripts.visualize_dataset --repo-id piper/pickandplace --web
# Attach to evaluation
python -m luckylab.scripts.play go2_velocity_flat --algorithm sac --checkpoint best_agent.pt --rerunWeights & Biases — enabled by default for both RL and IL. Disable with --agent.wandb false or --il.wandb false.
src/luckylab/
├── configs/ Simulation contract and shared configs
├── entity/ Robot entity and observation data
├── envs/ ManagerBasedRlEnv and MDP functions
│ └── mdp/ Observations, actions, rewards, terminations, curriculum
├── il/ Imitation learning
│ └── lerobot/ LeRobot integration (trainer, wrapper)
├── managers/ Observation, action, reward, termination, curriculum managers
├── rl/ Reinforcement learning
│ ├── skrl/ skrl backend
│ ├── sb3/ Stable Baselines3 backend
│ ├── config.py RlRunnerCfg and algorithm configs
│ └── common.py Shared utilities
├── scene/ Scene management
├── scripts/ CLI entry points (train, play, list_envs, visualize_dataset)
├── tasks/ Task definitions and registry
│ ├── velocity/ Locomotion velocity tracking
│ └── pickandplace/ Manipulation (IL)
├── utils/ NaN guard, noise models, rerun logger, keyboard, buffers
└── viewer/ Debug visualization with Viser
uv sync --all-groups
uv run pre-commit install
# Tests
uv run pytest tests -v
# Lint
uv run ruff check src tests
uv run ruff format src testsSee CONTRIBUTING.md for details.
LuckyLab is inspired by:
- MJLab — manager-based, config-driven environment architecture
- LeRobot — imitation learning policies and dataset format
Built on top of skrl and Stable Baselines3 for RL training.
MIT License — see LICENSE for details.