SpesRobotics · spirosperos · Aug 7, 2025 · Aug 7, 2025 · Oct 3, 2025 · Oct 8, 2025
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -36,26 +36,32 @@ repos:
       - id: check-yaml
       - id: check-toml
       - id: end-of-file-fixer
+        exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)
       - id: trailing-whitespace
+        exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)
 
   - repo: https://github.com/astral-sh/ruff-pre-commit
     rev: v0.12.4
     hooks:
       - id: ruff-format
+        exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)
       - id: ruff
         args: [--fix, --exit-non-zero-on-fix]
+        exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)
 
   - repo: https://github.com/adhtruong/mirrors-typos
     rev: v1.34.0
     hooks:
       - id: typos
         args: [--force-exclude]
+        exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)
 
   - repo: https://github.com/asottile/pyupgrade
     rev: v3.20.0
     hooks:
     -   id: pyupgrade
         args: [--py310-plus]
+        exclude: ^(outputs/|examples/hil_serl_simulation_training/outputs/)
 
   ##### Markdown Quality #####
   - repo: https://github.com/rbubley/mirrors-prettier

diff --git a/examples/grid_hil_serl/README.md b/examples/grid_hil_serl/README.md
@@ -0,0 +1,105 @@
+# Grid HIL SERL Environment
+
+This example demonstrates a **simplified HIL-SERL setup** for computer vision-based grid position prediction. Instead of complex robotic manipulation, the algorithm learns to predict which of the 64 grid cells contains a red cube based on camera images, with human feedback during training. Episodes are single prediction attempts: if the guess is correct, the agent receives reward 1; otherwise reward 0.
+
+## Overview
+
+The environment consists of:
+- An 8x8 grid world with high-definition visual rendering
+- A red cube that randomly spawns at grid cell centers
+- Top-left origin coordinate system (0,0) = top-left corner
+- Automatic high-definition image capture (1920x1080)
+
+
+## Environment preview
+
+![Coordinate system](media/coordinate_system.png)
+
+## Files
+
+- `grid_scene.xml` - Mujoco scene definition with 8x8 grid
+- `grid_cube_randomizer.py` - Main script for randomizing cube positions
+- `README.md` - This documentation
+
+## Usage
+
+### Install LeRobot (one time)
+Follow the main repository instructions (from repo root):
+```bash
+pip install -e ".[hilserl]"
+```
+
+### 2. Record Demonstrations (Optional - this repo already contains a recorded dataset)
+```bash
+# From the repository root
+python examples/grid_hil_serl/record_grid_demo.py \
+  --config_path examples/grid_hil_serl/record_grid_position_lerobot.json
+```
+
+### 3. Train HIL-SERL Policy
+```bash
+# Terminal 1: Start learner
+cd src
+python -m lerobot.scripts.rl.learner --config_path ../examples/grid_hil_serl/train_grid_position.json
+
+# Terminal 2: Start actor (with human feedback)
+cd src
+python -m lerobot.scripts.rl.actor --config_path ../examples/grid_hil_serl/train_grid_position.json
+```
+
+The actor prints a rolling accuracy over the last 50 episodes and saves a plot every
+10 episodes to `outputs/grid_position/accuracy_plots/` so you can monitor training
+progress without attaching a debugger.
+
+## Features
+
+This example gives you a fast, single‑step prediction task: every episode the
+cube appears in one of the 64 cells of an 8×8 grid and the policy must guess the
+cell from a high‑definition overhead image. Episodes are only one step long, so
+each prediction immediately becomes a labelled training example. Along the way the
+actor logs a rolling 50‑episode success rate and stores matplotlib accuracy plots,
+making it easy to gauge progress without additional tooling. Because the cubes are
+placed exactly at grid centres and the camera is fixed, the setup stays perfectly
+repeatable while still exercising the end‑to‑end vision→prediction loop.
+
+![Accuracy curve](media/accuracy_episode_00160.png)
+
+Accuracy typically climbs toward ~95 % after roughly 140 prediction episodes.
+
+## HIL-SERL Workflow
+
+This simplified setup demonstrates the core HIL-SERL concept with minimal complexity:
+
+### Training Phase (Offline)
+1. **Automatic Data Collection**: Environment randomly places cube in different grid positions
+2. **Supervised Learning**: Algorithm learns to predict grid position from images
+3. **Ground Truth Labels**: Exact grid coordinates provided for each image
+
+### Human-in-the-Loop Phase (Online)
+1. **Algorithm Prediction**: Model predicts cube position from camera images
+2. **Binary Feedback**: Human (or auto-supervision) accepts or corrects the guess
+3. **Iterative Learning**: Model improves based on the accepted/corrected outcome
+
+### Key Simplifications
+- **No Robot Control**: Focus purely on computer vision prediction
+- **Single-Step Episodes**: One prediction per episode with immediate success/failure reward
+- **Discrete Predictions**: 64 possible outputs (one per grid cell)
+- **Perfect Ground Truth**: Exact position labels available
+- **Visual Task Only**: No complex motor control or physics
+
+## Integration with LeRobot
+
+The environment integrates with LeRobot's HIL-SERL framework through:
+
+1. **Custom Gym Environment**: `GridPositionPrediction-v0` registered with gymnasium
+2. **LeRobot-Compatible Interface**: Proper observation/action space formatting
+3. **Config Files**: `record_grid_position.json` and `train_grid_position.json`
+4. **Dataset Collection**: Automated recording of image-position pairs
+
+## Dependencies
+
+- mujoco
+- numpy
+- PIL (Pillow)
+- gymnasium (optional, for integration)
+- matplotlib
diff --git a/examples/grid_hil_serl/grid_cube_randomizer.py b/examples/grid_hil_serl/grid_cube_randomizer.py
@@ -0,0 +1,161 @@
+#!/usr/bin/env python
+
+"""
+Random Grid Cube Spawner
+
+This script loads the 8x8 grid scene and randomly positions a cube
+in one of the 64 grid cells. The cube spawns at integer coordinates
+within the grid boundaries.
+"""
+
+import numpy as np
+import mujoco
+import mujoco.viewer
+import argparse
+import time
+from PIL import Image
+
+
+def save_camera_view(model, data, filename="img.jpg"):
+    """
+    Save the current camera view to a JPEG image file.
+
+    Args:
+        model: Mujoco model
+        data: Mujoco data
+        filename: Output filename (default: img.jpg)
+    """
+    try:
+        # Create a high-definition renderer for the current camera
+        renderer = mujoco.Renderer(model, height=1080, width=1920)
+
+        # Update the scene and render
+        renderer.update_scene(data, camera="grid_camera")
+        img = renderer.render()
+
+        if img is not None:
+            # Convert to PIL Image and save
+            image = Image.fromarray(img)
+            image.save(filename)
+            print(f"Camera view saved to: {filename}")
+        else:
+            print("Warning: Could not capture camera view")
+
+        # Clean up renderer (if close method exists)
+        if hasattr(renderer, 'close'):
+            renderer.close()
+
+    except Exception as e:
+        print(f"Error saving image: {e}")
+
+
+def randomize_cube_position(model, data, grid_size=8):
+    """
+    Randomly position the cube in one of the grid cells.
+
+    Args:
+        model: Mujoco model
+        data: Mujoco data
+        grid_size: Size of the grid (8x8)
+    """
+    # For 8x8 grid: generate random cell indices from 0-7 for both x and y
+    # This gives us coordinates for each of the 64 grid cells
+    x_cell = np.random.randint(0, 8)  # 0 to 7 inclusive
+    y_cell = np.random.randint(0, 8)  # 0 to 7 inclusive
+
+    # Convert cell indices to center positions (offset by 0.5 from grid lines)
+    # X: left(0) = -3.5, right(7) = 3.5
+    x_pos = (x_cell - grid_size // 2) + 0.5
+    # Y: top(0) = 3.5, bottom(7) = -3.5 (flipped coordinate system)
+    y_pos = (grid_size // 2 - y_cell) - 0.5
+
+    print(f"Spawning cube at grid cell ({x_cell}, {y_cell}) -> position ({x_pos}, {y_pos})")
+
+    # Set the cube position and velocity (free joint has 6 DOF: 3 pos + 3 vel)
+    cube_joint_id = mujoco.mj_name2id(model, mujoco.mjtObj.mjOBJ_JOINT, "cube_joint")
+
+    # Set position (x, y, z) - keep rotation as identity (0, 0, 0)
+    data.qpos[model.jnt_qposadr[cube_joint_id]:model.jnt_qposadr[cube_joint_id] + 6] = [x_pos, y_pos, 0.5, 0, 0, 0]
+
+    # Reset velocity to zero (linear and angular velocities)
+    data.qvel[model.jnt_dofadr[cube_joint_id]:model.jnt_dofadr[cube_joint_id] + 6] = [0, 0, 0, 0, 0, 0]
+
+    return x_pos, y_pos
+
+
+def run_grid_viewer(xml_path, randomize_interval=2.0, auto_save=True):
+    """
+    Run the grid viewer with random cube positioning.
+
+    Args:
+        xml_path: Path to the XML scene file
+        randomize_interval: How often to randomize cube position (seconds)
+        auto_save: Whether to automatically save camera view after each repositioning
+    """
+    print(f"Loading scene: {xml_path}")
+    model = mujoco.MjModel.from_xml_path(xml_path)
+    data = mujoco.MjData(model)
+
+    print("\n" + "="*50)
+    print("8x8 Grid Cube Randomizer")
+    print("="*50)
+    print("This scene shows an 8x8 grid with a randomly positioned cube.")
+    print(f"Cube position randomizes every {randomize_interval} seconds.")
+    print()
+    print("Controls:")
+    print("  R: Manually randomize cube position")
+    print("  S: Save current camera view to img.jpg")
+    print("  Space: Pause/unpause")
+    print("  Esc: Exit")
+    print("  Camera: Mouse controls for rotation/zoom")
+    print("="*50)
+
+    last_randomize_time = 0
+
+    with mujoco.viewer.launch_passive(model, data) as viewer:
+        # Initial randomization
+        x, y = randomize_cube_position(model, data)
+        mujoco.mj_forward(model, data)
+
+        while viewer.is_running():
+            current_time = time.time()
+
+            # Auto-randomize every few seconds
+            if current_time - last_randomize_time > randomize_interval:
+                x, y = randomize_cube_position(model, data)
+                mujoco.mj_forward(model, data)
+                # Force viewer to update the scene
+                viewer.sync()
+                # Save the current camera view if auto_save is enabled
+                if auto_save:
+                    save_camera_view(model, data, "img.jpg")
+                last_randomize_time = current_time
+
+            # Small delay to prevent excessive CPU usage
+            time.sleep(0.01)
+
+        print("\nViewer closed.")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="8x8 Grid Cube Randomizer")
+    parser.add_argument("--xml", type=str, default="grid_scene.xml",
+                       help="Path to XML scene file")
+    parser.add_argument("--interval", type=float, default=3.0,
+                       help="Randomization interval in seconds")
+    parser.add_argument("--no-save", action="store_true",
+                       help="Disable automatic saving of camera views")
+
+    args = parser.parse_args()
+
+    try:
+        run_grid_viewer(args.xml, args.interval, not args.no_save)
+    except FileNotFoundError:
+        print(f"Error: Could not find XML file '{args.xml}'")
+        print("Make sure the XML file exists in the current directory.")
+    except Exception as e:
+        print(f"Error: {e}")
+
+
+if __name__ == "__main__":
+    main()