luckyrobots · mickryley · Mar 23, 2026 · Mar 25, 2026 · Mar 25, 2026 · Mar 26, 2026
diff --git a/README.md b/README.md
@@ -1,255 +1,138 @@
 <p align="center">
   <h1 align="center">LuckyLab</h1>
   <p align="center">
-    <strong>A unified robot learning framework powered by <a href="https://github.com/luckyrobots/luckyrobots">LuckyEngine</a></strong>
+    <strong>RL and IL training framework for <a href="https://github.com/luckyrobots/LuckyEngine">LuckyEngine</a></strong>
   </p>
   <p align="center">
+    <a href="https://luckyrobots.com"><img src="https://img.shields.io/badge/Lucky_Robots-ff6600?style=flat&logoColor=white" alt="Lucky Robots"></a>
+    <a href="https://github.com/luckyrobots/LuckyEngine"><img src="https://img.shields.io/badge/LuckyEngine-0984e3?style=flat&logoColor=white" alt="LuckyEngine"></a>
     <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License: MIT"></a>
-    <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-blue.svg" alt="Python 3.10+"></a>
+  </p>
+  <p align="center">
+    <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-3776AB?logo=python&logoColor=white" alt="Python 3.10+"></a>
+    <a href="https://pytorch.org/"><img src="https://img.shields.io/badge/PyTorch-%3E%3D2.0-EE4C2C?logo=pytorch&logoColor=white" alt="PyTorch >= 2.0"></a>
+    <a href="https://github.com/luckyrobots/luckyrobots"><img src="https://img.shields.io/badge/luckyrobots-%3E%3D0.1.84-00b894" alt="luckyrobots >= 0.1.84"></a>
     <a href="https://docs.astral.sh/ruff/"><img src="https://img.shields.io/badge/code%20style-ruff-000000.svg" alt="Ruff"></a>
   </p>
 </p>
 
-LuckyLab is a modular, config-driven framework that brings reinforcement learning, imitation learning, and real-time visualization together in one place. It communicates with LuckyEngine through [luckyrobots](https://github.com/luckyrobots/luckyrobots) and runs on both CPU and GPU.
-
-The framework ships with locomotion and manipulation tasks but is easily extensible to any robot or task. It supports all imitation learning algorithms in [LeRobot](https://github.com/huggingface/lerobot) and multiple RL algorithms via [skrl](https://github.com/Toni-SM/skrl) and [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3). Live inspection is available through [Rerun](https://rerun.io) and [Viser](https://github.com/nerfstudio-project/viser).
-
-| Robot | Task | Learning |
-|-------|------|----------|
-| Unitree Go2 | Velocity tracking | RL (PPO, SAC, TD3, DDPG) |
-| Piper | Pick-and-place | IL (via LeRobot) |
+LuckyLab is the training and inference layer for robots simulated in [LuckyEngine](https://github.com/luckyrobots/LuckyEngine). It connects to LuckyEngine over gRPC (via the [luckyrobots](https://github.com/luckyrobots/luckyrobots) client), sends joint-level actions, and receives observations each step — all physics and rendering runs in LuckyEngine.
 
 ---
-
-## Requirements
-
-- Python 3.10+
-- [LuckyEngine](https://luckyrobots.com) executable
-- [luckyrobots](https://github.com/luckyrobots/luckyrobots) >= 0.1.81
-- PyTorch >= 2.0
-
-## Installation
+## Quick Start
+### 1. Installation
 
 ```bash
-git clone https://github.com/luckyrobots/luckylab.git
+git clone -b mick/release-2026-1 --single-branch https://github.com/luckyrobots/luckylab.git
 cd luckylab
 
-# Core + RL
-uv sync --group rl
-
-# Core + IL (LeRobot)
-uv sync --group il
-
-# Everything (RL + IL + Rerun + dev tools)
-uv sync --all-groups
+# Run the setup script for your OS 
+./setup.bat # Windows 
+./setup.sh # Linux
 ```
+### 2. Prepare LuckyEngine
 
----
-
-## Quick Start
-
-### Train
+1. Launch LuckyEngine
+2. Download the Piper Block Stacking project
+3. Open the Piper Block Stacking scene
+4. Open the gRPC Panel
+<table><tr><td>
 
-```bash
-# RL — train SAC on the Go2
-python -m luckylab.scripts.train go2_velocity_flat \
-    --agent.algorithm sac --agent.backend skrl --device cuda
-
-# IL — train ACT on a local dataset
-python -m luckylab.scripts.train piper_pickandplace \
-    --il.policy act --il.dataset-repo-id piper/pickandplace --device cuda
-```
+5. Follow the prompts to ensure:
+   - Action Gate is **Enabled**
+   - Server is **Running**
+   - Scene is **Playing**
 
-### Evaluate
+</td><td>
 
-```bash
-# RL — with keyboard control
-python -m luckylab.scripts.play go2_velocity_flat \
-    --algorithm sac --checkpoint runs/go2_velocity_sac/checkpoints/best_agent.pt \
-    --keyboard
-
-# IL
-python -m luckylab.scripts.play piper_pickandplace \
-    --policy act --checkpoint runs/luckylab_il/final
-```
+<img width="300" alt="gRPC Panel" src="https://github.com/user-attachments/assets/352bd83e-29d7-4c6f-af79-b27ba412c4e4" />
 
-Keyboard controls: **W/S** forward/back, **A/D** strafe, **Q/E** turn, **Space** zero, **Esc** quit.
+</td></tr></table>
 
-### Visualize
+### 3. Run Debug Viewer
 
 ```bash
-# Browse a dataset in Rerun (opens in browser)
-python -m luckylab.scripts.visualize_dataset \
-    --repo-id piper/pickandplace --episode-index 0 --web
-
-# List all registered tasks
-python -m luckylab.scripts.list_envs
+# Run the gRPC viewer script for your OS 
+./run_debug_viewer.bat # Windows 
+./run_debug_viewer.sh # Linux
 ```
 
----
-
-## Reinforcement Learning
-
-Four algorithms across two backends, all configurable via CLI or Python:
-
-| Algorithm | Type | Backends |
-|-----------|------|----------|
-| **PPO** | On-policy | skrl, sb3 |
-| **SAC** | Off-policy | skrl, sb3 |
-| **TD3** | Off-policy | skrl, sb3 |
-| **DDPG** | Off-policy | skrl, sb3 |
+If everything has been configured correctly, this script will log the inputs/outputs between LuckyLab and LuckyEngine, and display the camera feed being exported from LuckyEngine to LuckyLab.
+### 4. Download & Run Piper Block Stacking Demo Model
 
 ```bash
-python -m luckylab.scripts.train go2_velocity_flat \
-    --agent.algorithm sac --agent.backend skrl \
-    --agent.max-iterations 5000 \
-    --env.num-envs 4096 \
-    --device cuda
-```
-
-```python
-from luckylab.rl import train, RlRunnerCfg
-from luckylab.tasks import load_env_cfg
-
-env_cfg = load_env_cfg("go2_velocity_flat")
-rl_cfg = RlRunnerCfg(algorithm="sac", backend="skrl", max_iterations=5000)
-train(env_cfg=env_cfg, rl_cfg=rl_cfg, device="cuda")
-```
-
-> **Note:** LuckyEngine does not currently support environment parallelization, so on-policy algorithms like PPO that depend on large batch collection are not recommended. Off-policy algorithms like SAC are the best fit for now. Parallelization support is actively being worked on.
+# Run the model download script for your OS
+# Windows 
+./download_demo.bat 
+./run_demo.bat
 
-> **Backend recommendation:** Stable Baselines3 is not designed for GPU training. If you want to train on GPU, use the skrl backend (`--agent.backend skrl`).
-
----
-
-## Imitation Learning
-
-LuckyLab integrates with [LeRobot](https://github.com/huggingface/lerobot) for imitation learning. ACT and Diffusion Policy are ready to use out of the box. Other LeRobot policies (Pi0, SmolVLA, etc.) are supported but require registering a task config for them first, similar to how the ACT and Diffusion configs are set up.
-
-```bash
-python -m luckylab.scripts.train piper_pickandplace \
-    --il.policy act \
-    --il.dataset-repo-id piper/pickandplace \
-    --il.batch-size 8 \
-    --il.num-train-steps 100000 \
-    --device cuda
+# Linux
+./download_demo.sh
+./run_demo.sh
 ```
 
-Datasets are loaded from the [HuggingFace Hub](https://huggingface.co/datasets) or from a local directory at `~/.luckyrobots/data/` (configurable via `LUCKYROBOTS_DATA_HOME`).
+Manually downloaded models need to be placed within their own subfolder within the /runs/ directory of LuckyLab, where-as the download scripts already extract to the appropriate nested location.
 
 ---
 
-## Tasks
-
-Tasks bundle an environment config with RL and/or IL configs. The registry makes it easy to add new ones:
-
-```python
-from luckylab.tasks import register_task
-from luckylab.envs import ManagerBasedRlEnvCfg
-from luckylab.rl import RlRunnerCfg
-
-env_cfg = ManagerBasedRlEnvCfg(
-    decimation=4,
-    robot="unitreego2",
-    scene="velocity",
-    observations={...},
-    actions={...},
-    rewards={...},
-    terminations={...},
-)
-
-register_task(
-    "my_task",
-    env_cfg,
-    rl_cfgs={"ppo": RlRunnerCfg(algorithm="ppo", max_iterations=3000)},
-)
-```
-
----
-
-## Architecture
-
-LuckyLab uses a manager-based environment where each MDP component is handled by a dedicated manager, configured with direct function references:
-
-```
-ManagerBasedRlEnv
-├── ObservationManager   Observation groups with noise, delay, and history
-├── ActionManager        Action scaling, offset, and joint commands
-├── RewardManager        Weighted sum of reward terms
-├── TerminationManager   Episode termination conditions
-└── CurriculumManager    Progressive difficulty adjustment
-```
-
-```python
-from luckylab.managers import RewardTermCfg, TerminationTermCfg
-from luckylab.tasks.velocity import mdp
+## How It Works
 
-rewards = {
-    "track_velocity": RewardTermCfg(func=mdp.track_linear_velocity, weight=2.0, params={"std": 0.5}),
-    "action_rate": RewardTermCfg(func=mdp.action_rate_l2, weight=-0.1),
-}
+```mermaid
+graph TD
+    LE[LuckyEngine]
 
-terminations = {
-    "time_out": TerminationTermCfg(func=mdp.time_out, time_out=True),
-    "fell_over": TerminationTermCfg(func=mdp.bad_orientation, params={"limit_angle": 1.2}),
-}
-```
+    LE <--> LR[luckyrobots client]
 
----
+    LR --> ENV
 
-## Visualization & Logging
+    subgraph LuckyLab ["&nbsp;&nbsp;&nbsp;&nbsp;LuckyLab&nbsp;&nbsp;&nbsp;&nbsp;"]
+        ENV[ManagerBasedEnv]
+        ENV --- OBS[Observations]
+        ENV --- ACT[Actions]
+        ENV --- REW[Rewards]
+        ENV --- TERM[Terminations]
+        ENV --- CURR[Curriculum]
+    end
 
-**Policy Viewer** — a web-based MuJoCo viewer powered by [Viser](https://github.com/nerfstudio-project/viser) for inspecting trained RL policies. Renders the robot in a browser with velocity command sliders, pause/play, speed control, and keyboard input — no LuckyEngine connection required.
+    subgraph Backends ["Training Backends"]
+        SKRL[skrl — RL]
+        SB3[SB3 — RL]
+        LEROBOT[LeRobot — IL]
+    end
 
-```bash
-# Open http://localhost:8080 after starting
-python -m luckylab.viewer.run_policy runs/go2_velocity_sac/checkpoints/best_agent.pt
-```
+    ENV --> Backends
 
-**Rerun** — live step-by-step inspection of observations, actions, rewards, and camera feeds. No LuckyEngine connection required.
+    style LE fill:#1a1a2e,stroke:#0984e3,stroke-width:2px,color:#74b9ff
+    style LR fill:#1a1a2e,stroke:#00b894,stroke-width:2px,color:#55efc4
 
-```bash
-# Dataset viewer
-python -m luckylab.scripts.visualize_dataset --repo-id piper/pickandplace --web
+    style LuckyLab fill:#16213e,stroke:#6c5ce7,stroke-width:2px,color:#a29bfe
+    style ENV fill:#1a1a2e,stroke:#6c5ce7,stroke-width:2px,color:#a29bfe
+    style OBS fill:#1a1a2e,stroke:#636e72,stroke-width:1px,color:#dfe6e9
+    style ACT fill:#1a1a2e,stroke:#636e72,stroke-width:1px,color:#dfe6e9
+    style REW fill:#1a1a2e,stroke:#636e72,stroke-width:1px,color:#dfe6e9
+    style TERM fill:#1a1a2e,stroke:#636e72,stroke-width:1px,color:#dfe6e9
+    style CURR fill:#1a1a2e,stroke:#636e72,stroke-width:1px,color:#dfe6e9
 
-# Attach to evaluation
-python -m luckylab.scripts.play go2_velocity_flat --algorithm sac --checkpoint best_agent.pt --rerun
+    style Backends fill:#16213e,stroke:#e17055,stroke-width:2px,color:#fab1a0
+    style SKRL fill:#1a1a2e,stroke:#e17055,stroke-width:2px,color:#fab1a0
+    style SB3 fill:#1a1a2e,stroke:#e17055,stroke-width:2px,color:#fab1a0
+    style LEROBOT fill:#1a1a2e,stroke:#fdcb6e,stroke-width:2px,color:#ffeaa7
 ```
 
-**Weights & Biases** — enabled by default for both RL and IL. Disable with `--agent.wandb false` or `--il.wandb false`.
+LuckyEngine handles all physics simulation (built on MuJoCo). LuckyLab is purely a training orchestrator — it does not run physics locally. The [luckyrobots](https://github.com/luckyrobots/luckyrobots) package manages the gRPC connection, engine lifecycle, and domain randomization protocol.
 
 ---
 
-## Project Structure
+## Status
 
-```
-src/luckylab/
-├── configs/          Simulation contract and shared configs
-├── entity/           Robot entity and observation data
-├── envs/             ManagerBasedRlEnv and MDP functions
-│   └── mdp/          Observations, actions, rewards, terminations, curriculum
-├── il/               Imitation learning
-│   └── lerobot/      LeRobot integration (trainer, wrapper)
-├── managers/         Observation, action, reward, termination, curriculum managers
-├── rl/               Reinforcement learning
-│   ├── skrl/         skrl backend
-│   ├── sb3/          Stable Baselines3 backend
-│   ├── config.py     RlRunnerCfg and algorithm configs
-│   └── common.py     Shared utilities
-├── scene/            Scene management
-├── scripts/          CLI entry points (train, play, list_envs, visualize_dataset)
-├── tasks/            Task definitions and registry
-│   ├── velocity/     Locomotion velocity tracking
-│   └── pickandplace/ Manipulation (IL)
-├── utils/            NaN guard, noise models, rerun logger, keyboard, buffers
-└── viewer/           Debug visualization with Viser
-```
+LuckyLab is in **early development (alpha)**. The Piper block-stacking demo above is the current focus. The codebase also includes scaffolding for reinforcement learning (Go2 velocity tracking via [skrl](https://github.com/Toni-SM/skrl) / [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3)) and additional imitation learning policies via [LeRobot](https://github.com/huggingface/lerobot).
 
 ---
 
 ## Development
 
 ```bash
+# Manual install with uv (instead of setup scripts)
 uv sync --all-groups
 uv run pre-commit install
 
@@ -260,19 +143,3 @@ uv run pytest tests -v
 uv run ruff check src tests
 uv run ruff format src tests
 ```
-
-See [CONTRIBUTING.md](CONTRIBUTING.md) for details.
-
----
-
-## Acknowledgments
-
-LuckyLab is inspired by:
-- [MJLab](https://github.com/google-deepmind/mujoco_playground) — manager-based, config-driven environment architecture
-- [LeRobot](https://github.com/huggingface/lerobot) — imitation learning policies and dataset format
-
-Built on top of [skrl](https://github.com/Toni-SM/skrl) and [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) for RL training.
-
-## License
-
-MIT License — see [LICENSE](LICENSE) for details.
diff --git a/download_demo.bat b/download_demo.bat
@@ -0,0 +1,39 @@
+@echo off
+setlocal
+
+set REPO=luckyrobots/luckylab
+set TAG=demo-v0.1.0
+set DEMO_NAME=piper_blockstacking_act
+set ZIP_NAME=%DEMO_NAME%.zip
+set DOWNLOAD_URL=https://github.com/%REPO%/releases/download/%TAG%/%ZIP_NAME%
+
+rem Resolve the directory this script lives in (the luckylab root)
+rem %~dp0 has a trailing backslash — remove it so quoted paths don't break
+set SCRIPT_DIR=%~dp0
+if "%SCRIPT_DIR:~-1%"=="\" set "SCRIPT_DIR=%SCRIPT_DIR:~0,-1%"
+
+echo Downloading demo from %DOWNLOAD_URL% ...
+curl -L "%DOWNLOAD_URL%" -o "%SCRIPT_DIR%\%ZIP_NAME%"
+if %errorlevel% neq 0 (
+    echo ERROR: Download failed. Make sure curl is available and the URL is correct.
+    pause
+    exit /b 1
+)
+
+echo Extracting demo ...
+powershell -Command "Expand-Archive -Path '%SCRIPT_DIR%\%ZIP_NAME%' -DestinationPath '%SCRIPT_DIR%' -Force"
+if %errorlevel% neq 0 (
+    echo ERROR: Extraction failed.
+    pause
+    exit /b 1
+)
+
+del "%SCRIPT_DIR%\%ZIP_NAME%"
+
+echo.
+echo Demo installed successfully.
+echo   Model:  runs\%DEMO_NAME%\final\
+echo   Script: run_demo.bat
+echo.
+echo Run 'run_demo.bat' to start the demo.
+pause