Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,26 +36,32 @@ repos:
- id: check-yaml
- id: check-toml
- id: end-of-file-fixer
exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)
- id: trailing-whitespace
exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.12.4
hooks:
- id: ruff-format
exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)

- repo: https://github.com/adhtruong/mirrors-typos
rev: v1.34.0
hooks:
- id: typos
args: [--force-exclude]
exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)

- repo: https://github.com/asottile/pyupgrade
rev: v3.20.0
hooks:
- id: pyupgrade
args: [--py310-plus]
exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)

##### Markdown Quality #####
- repo: https://github.com/rbubley/mirrors-prettier
Expand Down
105 changes: 105 additions & 0 deletions examples/grid_hil_serl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Grid HIL SERL Environment

This example demonstrates a **simplified HIL-SERL setup** for computer vision-based grid position prediction. Instead of complex robotic manipulation, the algorithm learns to predict which of the 64 grid cells contains a red cube based on camera images, with human feedback during training. Episodes are single prediction attempts: if the guess is correct, the agent receives reward 1; otherwise reward 0.

## Overview

The environment consists of:
- An 8x8 grid world with high-definition visual rendering
- A red cube that randomly spawns at grid cell centers
- Top-left origin coordinate system (0,0) = top-left corner
- Automatic high-definition image capture (1920x1080)


## Environment preview

![Coordinate system](media/coordinate_system.png)

## Files

- `grid_scene.xml` - Mujoco scene definition with 8x8 grid
- `grid_cube_randomizer.py` - Main script for randomizing cube positions
- `README.md` - This documentation

## Usage

### Install LeRobot (one time)
Follow the main repository instructions (from repo root):
```bash
pip install -e ".[hilserl]"
```

### 2. Record Demonstrations (Optional - this repo already contains a recorded dataset)
```bash
# From the repository root
python examples/grid_hil_serl/record_grid_demo.py \
--config_path examples/grid_hil_serl/record_grid_position_lerobot.json
```

### 3. Train HIL-SERL Policy
```bash
# Terminal 1: Start learner
cd src
python -m lerobot.scripts.rl.learner --config_path ../examples/grid_hil_serl/train_grid_position.json

# Terminal 2: Start actor (with human feedback)
cd src
python -m lerobot.scripts.rl.actor --config_path ../examples/grid_hil_serl/train_grid_position.json
```

The actor prints a rolling accuracy over the last 50 episodes and saves a plot every
10 episodes to `outputs/grid_position/accuracy_plots/` so you can monitor training
progress without attaching a debugger.

## Features

This example gives you a fast, single‑step prediction task: every episode the
cube appears in one of the 64 cells of an 8×8 grid and the policy must guess the
cell from a high‑definition overhead image. Episodes are only one step long, so
each prediction immediately becomes a labelled training example. Along the way the
actor logs a rolling 50‑episode success rate and stores matplotlib accuracy plots,
making it easy to gauge progress without additional tooling. Because the cubes are
placed exactly at grid centres and the camera is fixed, the setup stays perfectly
repeatable while still exercising the end‑to‑end vision→prediction loop.

![Accuracy curve](media/accuracy_episode_00160.png)

Accuracy typically climbs toward ~95 % after roughly 140 prediction episodes.

## HIL-SERL Workflow

This simplified setup demonstrates the core HIL-SERL concept with minimal complexity:

### Training Phase (Offline)
1. **Automatic Data Collection**: Environment randomly places cube in different grid positions
2. **Supervised Learning**: Algorithm learns to predict grid position from images
3. **Ground Truth Labels**: Exact grid coordinates provided for each image

### Human-in-the-Loop Phase (Online)
1. **Algorithm Prediction**: Model predicts cube position from camera images
2. **Binary Feedback**: Human (or auto-supervision) accepts or corrects the guess
3. **Iterative Learning**: Model improves based on the accepted/corrected outcome

### Key Simplifications
- **No Robot Control**: Focus purely on computer vision prediction
- **Single-Step Episodes**: One prediction per episode with immediate success/failure reward
- **Discrete Predictions**: 64 possible outputs (one per grid cell)
- **Perfect Ground Truth**: Exact position labels available
- **Visual Task Only**: No complex motor control or physics

## Integration with LeRobot

The environment integrates with LeRobot's HIL-SERL framework through:

1. **Custom Gym Environment**: `GridPositionPrediction-v0` registered with gymnasium
2. **LeRobot-Compatible Interface**: Proper observation/action space formatting
3. **Config Files**: `record_grid_position.json` and `train_grid_position.json`
4. **Dataset Collection**: Automated recording of image-position pairs

## Dependencies

- mujoco
- numpy
- PIL (Pillow)
- gymnasium (optional, for integration)
- matplotlib
161 changes: 161 additions & 0 deletions examples/grid_hil_serl/grid_cube_randomizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
#!/usr/bin/env python

"""
Random Grid Cube Spawner

This script loads the 8x8 grid scene and randomly positions a cube
in one of the 64 grid cells. The cube spawns at integer coordinates
within the grid boundaries.
"""

import numpy as np
import mujoco
import mujoco.viewer
import argparse
import time
from PIL import Image


def save_camera_view(model, data, filename="img.jpg"):
"""
Save the current camera view to a JPEG image file.

Args:
model: Mujoco model
data: Mujoco data
filename: Output filename (default: img.jpg)
"""
try:
# Create a high-definition renderer for the current camera
renderer = mujoco.Renderer(model, height=1080, width=1920)

# Update the scene and render
renderer.update_scene(data, camera="grid_camera")
img = renderer.render()

if img is not None:
# Convert to PIL Image and save
image = Image.fromarray(img)
image.save(filename)
print(f"Camera view saved to: {filename}")
else:
print("Warning: Could not capture camera view")

# Clean up renderer (if close method exists)
if hasattr(renderer, 'close'):
renderer.close()

except Exception as e:
print(f"Error saving image: {e}")


def randomize_cube_position(model, data, grid_size=8):
"""
Randomly position the cube in one of the grid cells.

Args:
model: Mujoco model
data: Mujoco data
grid_size: Size of the grid (8x8)
"""
# For 8x8 grid: generate random cell indices from 0-7 for both x and y
# This gives us coordinates for each of the 64 grid cells
x_cell = np.random.randint(0, 8) # 0 to 7 inclusive
y_cell = np.random.randint(0, 8) # 0 to 7 inclusive

# Convert cell indices to center positions (offset by 0.5 from grid lines)
# X: left(0) = -3.5, right(7) = 3.5
x_pos = (x_cell - grid_size // 2) + 0.5
# Y: top(0) = 3.5, bottom(7) = -3.5 (flipped coordinate system)
y_pos = (grid_size // 2 - y_cell) - 0.5

print(f"Spawning cube at grid cell ({x_cell}, {y_cell}) -> position ({x_pos}, {y_pos})")

# Set the cube position and velocity (free joint has 6 DOF: 3 pos + 3 vel)
cube_joint_id = mujoco.mj_name2id(model, mujoco.mjtObj.mjOBJ_JOINT, "cube_joint")

# Set position (x, y, z) - keep rotation as identity (0, 0, 0)
data.qpos[model.jnt_qposadr[cube_joint_id]:model.jnt_qposadr[cube_joint_id] + 6] = [x_pos, y_pos, 0.5, 0, 0, 0]

# Reset velocity to zero (linear and angular velocities)
data.qvel[model.jnt_dofadr[cube_joint_id]:model.jnt_dofadr[cube_joint_id] + 6] = [0, 0, 0, 0, 0, 0]

return x_pos, y_pos


def run_grid_viewer(xml_path, randomize_interval=2.0, auto_save=True):
"""
Run the grid viewer with random cube positioning.

Args:
xml_path: Path to the XML scene file
randomize_interval: How often to randomize cube position (seconds)
auto_save: Whether to automatically save camera view after each repositioning
"""
print(f"Loading scene: {xml_path}")
model = mujoco.MjModel.from_xml_path(xml_path)
data = mujoco.MjData(model)

print("\n" + "="*50)
print("8x8 Grid Cube Randomizer")
print("="*50)
print("This scene shows an 8x8 grid with a randomly positioned cube.")
print(f"Cube position randomizes every {randomize_interval} seconds.")
print()
print("Controls:")
print(" R: Manually randomize cube position")
print(" S: Save current camera view to img.jpg")
print(" Space: Pause/unpause")
print(" Esc: Exit")
print(" Camera: Mouse controls for rotation/zoom")
print("="*50)

last_randomize_time = 0

with mujoco.viewer.launch_passive(model, data) as viewer:
# Initial randomization
x, y = randomize_cube_position(model, data)
mujoco.mj_forward(model, data)

while viewer.is_running():
current_time = time.time()

# Auto-randomize every few seconds
if current_time - last_randomize_time > randomize_interval:
x, y = randomize_cube_position(model, data)
mujoco.mj_forward(model, data)
# Force viewer to update the scene
viewer.sync()
# Save the current camera view if auto_save is enabled
if auto_save:
save_camera_view(model, data, "img.jpg")
last_randomize_time = current_time

# Small delay to prevent excessive CPU usage
time.sleep(0.01)

print("\nViewer closed.")


def main():
parser = argparse.ArgumentParser(description="8x8 Grid Cube Randomizer")
parser.add_argument("--xml", type=str, default="grid_scene.xml",
help="Path to XML scene file")
parser.add_argument("--interval", type=float, default=3.0,
help="Randomization interval in seconds")
parser.add_argument("--no-save", action="store_true",
help="Disable automatic saving of camera views")

args = parser.parse_args()

try:
run_grid_viewer(args.xml, args.interval, not args.no_save)
except FileNotFoundError:
print(f"Error: Could not find XML file '{args.xml}'")
print("Make sure the XML file exists in the current directory.")
except Exception as e:
print(f"Error: {e}")


if __name__ == "__main__":
main()
Loading