GitHub - 7jep7/human2robot: 🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning

Build Your Own HopeJR Robot!

Meet HopeJR – A humanoid robot arm and hand for dexterous manipulation!

Control it with exoskeletons and gloves for precise hand movements.

Perfect for advanced manipulation tasks! 🤖

See the full HopeJR tutorial here.

Build Your Own SO-101 Robot!

Meet the updated SO100, the SO-101 – Just €114 per arm!

Train it in minutes with a few simple moves on your laptop.

Then sit back and watch your creation act autonomously! 🤯

See the full SO-101 tutorial here.

Want to take it to the next level? Make your SO-101 mobile by building LeKiwi!

Check out the LeKiwi tutorial and bring your robot to life on wheels.

LeRobot: State-of-the-art AI for real-world robotics

🎯 Project Overview

human2robot is a hackathon project that bridges the gap between human video demonstrations and robot training data. Instead of requiring expensive teleoperation hardware setups, we enable robot training from simple video recordings of human task demonstrations.

🚀 The Vision

Traditional robot imitation learning requires:

Expensive leader-follower robot pairs for teleoperation
Expert operators to demonstrate tasks
Complex hardware setups and calibration

Our solution: Record a human performing a task → Generate robot training data → Train robots via imitation learning

👥 Team & Timeline

Duration: 6-hour hackathon
Team: 3 people
- Dami + Omar: UCL Robotics students (Computer Vision)
- Jonas: Inverse Kinematics & Robot Training Pipeline

🔧 Technical Architecture

Task 1: Computer Vision Pipeline (Dami + Omar)

Goal: Extract end-effector motion and object interaction from video

Input: Video of human hand performing a simple task (e.g., moving a chess rook)
Output: 2D trajectory of end-effector position and object movement
Scope: Focus on simple push operations (rook from one square to adjacent square)
Tech Stack: Computer vision, object tracking, motion analysis

Task 2: Inverse Kinematics Engine (Jonas)

Goal: Convert CV outputs to robot joint trajectories

Input: End-effector trajectories from Task 1
Output: Time series of joint positions (motor encoder data)
Challenge: Generate realistic robot motion that achieves the same task
Result: Training data equivalent to teleoperated demonstrations

Task 3: Marketplace Demo (Future)

Goal: End-to-end demonstration platform

Component A: Task specification interface
- Users define desired robot behaviors
- Specify hardware requirements and constraints
Component B: Data pipeline demonstration
- Human demonstrates task via video
- human2robot converts to training data
- Imitation learning trains the model

🤔 Open Research Questions

The Camera Problem

Challenge: Real robots need visual input during operation, not just joint trajectories.

Current state: We generate joint motion data from video
Missing piece: How does the robot "see" during execution?
Questions:
- Can we generate synthetic camera views for the robot's perspective?
- How do we bridge human hand demonstrations to robot end-effector views?
- Can we train vision models to translate between human and robot perspectives?

Potential Solutions

Domain Transfer: Train vision models to map human→robot viewpoints
Synthetic Data: Generate robot-perspective videos from human demonstrations
Multi-modal Training: Combine trajectory data with vision adaptation
View Synthesis: Use computer graphics to render robot's perspective

🛠 Getting Started

# Clone the repository
git clone https://github.com/your-username/human2robot.git
cd human2robot

# Install dependencies
pip install -r requirements.txt

# Run the pipeline
python scripts/video_to_robot_data.py --input demo_video.mp4 --robot_config so101

📊 Pipeline Overview

Human Video → CV Analysis → Inverse Kinematics → Robot Training Data → Imitation Learning
     ↓              ↓              ↓                    ↓                    ↓
  demo.mp4    trajectories.json  joint_data.csv   lerobot_dataset/    trained_policy.pt

🎯 Current Status

Project setup and architecture design
Computer vision pipeline for motion extraction
Inverse kinematics solver implementation
Integration testing with sample data
LeRobot dataset format compatibility
Demo marketplace interface

🤝 Contributing

This is a hackathon exploration project. We welcome:

Ideas for solving the camera perspective problem
Improvements to the CV→IK pipeline
Real-world testing and validation
Extensions to new robot platforms

📋 Development Roadmap

📱 MVP Demo (Week 1)

Goal: End-to-end pipeline from video → robot policy

Core Tasks:

Complete Inverse Kinematics Bridge (hand_to_robot_ik.py)
- Hand coordinate → robot workspace mapping
- 5DOF arm + gripper IK solver integration
- Temporal trajectory generation
Data Pipeline Integration
- Convert CV+IK outputs to LeRobot dataset format
- Implement observation-action pair generation
- Add temporal synchronization and smoothing
Demo Implementation
- Simple chess piece movement task
- Single camera, fixed workspace setup
- ACT policy training on generated data

🚀 High-Impact Research Extensions (Weeks 2-4)

Vision & Perception Research:

Multi-view Geometry: Camera calibration and 3D reconstruction from human demonstrations
Domain Adaptation: Learning visual mappings between human and robot perspectives
Temporal Action Segmentation: Automatic detection of action primitives in demonstrations

Robotics & Control:

Workspace Scaling: Adaptive mapping between human and robot workspaces
Trajectory Optimization: Physics-informed smoothing and feasibility constraints
Multi-robot Coordination: Extending to bimanual or multi-agent scenarios

Machine Learning Foundations:

Comparative Policy Analysis: Systematic evaluation of ACT vs Diffusion Policy performance
Data Efficiency: Few-shot learning from minimal human demonstrations
Uncertainty Quantification: Confidence estimation in generated robot trajectories

Novel Research Directions:

Synthetic Data Augmentation: Physics simulation for expanding training datasets
Cross-embodiment Transfer: Learning mappings between different robot morphologies
Interactive Learning: Real-time feedback and correction mechanisms

Sensor Fusion & Modalities:

LiDAR Integration: iPhone LiDAR (256x192) depth sensing for 3D workspace understanding
Multi-sensor Fusion: Combine RGB, depth, IMU, and tactile feedback
Sensor Modality Transfer: Learn mappings between different sensor types

Beyond Imitation Learning:

Hybrid IL+RL: Use IL for reliable scene reset, then overnight RL for task optimization
Comprehensive IL Survey: Implement and compare SQIL, ValueDice, IQ-Learn, f-BRAC
Advanced IL Methods: Adversarial IL (GAIL), Distribution Matching (PM, BC-O)
Simulation-to-Reality: Isaac Lab integration for physics-based training

🎯 Learning & Research Skills Development

Foundational Understanding:

Implement core algorithms from scratch (IK solvers, transformers, diffusion models)
Mathematical foundations: robotics kinematics, probabilistic models, optimization
Experimental design: hypothesis formation, systematic evaluation, statistical analysis

Innovation Opportunities:

Novel Loss Functions: Task-specific objectives for trajectory generation
Architecture Design: Custom neural network components for robotics
Benchmark Creation: Standardized evaluation protocols for video-to-robot learning

⚠️ Hardware Access Timeline

Critical Constraint: Hardware access ends in 5 days, then 2-4 month gap

Priority: Focus on simulation-based development (Isaac Lab, MuJoCo)
Data Collection Sprint: Gather comprehensive demonstration videos while hardware available
Simulation-First Approach: Develop and validate algorithms in simulation for future hardware deployment

🔗 Built on LeRobot Foundation

This project builds upon the excellent LeRobot framework for the imitation learning components.

Original LeRobot Description

🤗 LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch. The goal is to lower the barrier to entry to robotics so that everyone can contribute and benefit from sharing datasets and pretrained models.

🤗 LeRobot contains state-of-the-art approaches that have been shown to transfer to the real-world with a focus on imitation learning and reinforcement learning.

🤗 LeRobot already provides a set of pretrained models, datasets with human collected demonstrations, and simulation environments to get started without assembling a robot. In the coming weeks, the plan is to add more and more support for real-world robotics on the most affordable and capable robots out there.

🤗 LeRobot hosts pretrained models and datasets on this Hugging Face community page: huggingface.co/lerobot

Examples of pretrained models on simulation environments


ACT policy on ALOHA env	TDMPC policy on SimXArm env	Diffusion policy on PushT env

Acknowledgment

The LeRobot team 🤗 for building SmolVLA Paper, Blog.
Thanks to Tony Zhao, Zipeng Fu and colleagues for open sourcing ACT policy, ALOHA environments and datasets. Ours are adapted from ALOHA and Mobile ALOHA.
Thanks to Cheng Chi, Zhenjia Xu and colleagues for open sourcing Diffusion policy, Pusht environment and datasets, as well as UMI datasets. Ours are adapted from Diffusion Policy and UMI Gripper.
Thanks to Nicklas Hansen, Yunhai Feng and colleagues for open sourcing TDMPC policy, Simxarm environments and datasets. Ours are adapted from TDMPC and FOWM.
Thanks to Antonio Loquercio and Ashish Kumar for their early support.
Thanks to Seungjae (Jay) Lee, Mahi Shafiullah and colleagues for open sourcing VQ-BeT policy and helping us adapt the codebase to our repository. The policy is adapted from VQ-BeT repo.

Installation

Download our source code:

git clone https://github.com/huggingface/lerobot.git
cd lerobot

Create a virtual environment with Python 3.10 and activate it, e.g. with miniconda:

conda create -y -n lerobot python=3.10
conda activate lerobot

When using miniconda, install ffmpeg in your environment:

conda install ffmpeg -c conda-forge

NOTE: This usually installs ffmpeg 7.X for your platform compiled with the libsvtav1 encoder. If libsvtav1 is not supported (check supported encoders with ffmpeg -encoders), you can:

[On any platform] Explicitly install ffmpeg 7.X using:
conda install ffmpeg=7.1.1 -c conda-forge
[On Linux only] Install ffmpeg build dependencies and compile ffmpeg from source with libsvtav1, and make sure you use the corresponding ffmpeg binary to your install with which ffmpeg.

Install 🤗 LeRobot:

pip install -e .

NOTE: If you encounter build errors, you may need to install additional dependencies (cmake, build-essential, and ffmpeg libs). On Linux, run: sudo apt-get install cmake build-essential python3-dev pkg-config libavformat-dev libavcodec-dev libavdevice-dev libavutil-dev libswscale-dev libswresample-dev libavfilter-dev. For other systems, see: Compiling PyAV

For simulations, 🤗 LeRobot comes with gymnasium environments that can be installed as extras:

For instance, to install 🤗 LeRobot with aloha and pusht, use:

pip install -e ".[aloha, pusht]"

To use Weights and Biases for experiment tracking, log in with

wandb login

(note: you will also need to enable WandB in the configuration. See below.)

Visualize datasets

Check out example 1 that illustrates how to use our dataset class which automatically downloads data from the Hugging Face hub.

You can also locally visualize episodes from a dataset on the hub by executing our script from the command line:

python -m lerobot.scripts.visualize_dataset \
    --repo-id lerobot/pusht \
    --episode-index 0

or from a dataset in a local folder with the root option and the --local-files-only (in the following case the dataset will be searched for in ./my_local_data_dir/lerobot/pusht)

python -m lerobot.scripts.visualize_dataset \
    --repo-id lerobot/pusht \
    --root ./my_local_data_dir \
    --local-files-only 1 \
    --episode-index 0

It will open rerun.io and display the camera streams, robot states and actions, like this:

battery-720p.mov

Our script can also visualize datasets stored on a distant server. See python -m lerobot.scripts.visualize_dataset --help for more instructions.

The `LeRobotDataset` format

A dataset in LeRobotDataset format is very simple to use. It can be loaded from a repository on the Hugging Face hub or a local folder simply with e.g. dataset = LeRobotDataset("lerobot/aloha_static_coffee") and can be indexed into like any Hugging Face and PyTorch dataset. For instance dataset[0] will retrieve a single temporal frame from the dataset containing observation(s) and an action as PyTorch tensors ready to be fed to a model.

A specificity of LeRobotDataset is that, rather than retrieving a single frame by its index, we can retrieve several frames based on their temporal relationship with the indexed frame, by setting delta_timestamps to a list of relative times with respect to the indexed frame. For example, with delta_timestamps = {"observation.image": [-1, -0.5, -0.2, 0]} one can retrieve, for a given index, 4 frames: 3 "previous" frames 1 second, 0.5 seconds, and 0.2 seconds before the indexed frame, and the indexed frame itself (corresponding to the 0 entry). See example 1_load_lerobot_dataset.py for more details on delta_timestamps.

Under the hood, the LeRobotDataset format makes use of several ways to serialize data which can be useful to understand if you plan to work more closely with this format. We tried to make a flexible yet simple dataset format that would cover most type of features and specificities present in reinforcement learning and robotics, in simulation and in real-world, with a focus on cameras and robot states but easily extended to other types of sensory inputs as long as they can be represented by a tensor.

Here are the important details and internal structure organization of a typical LeRobotDataset instantiated with dataset = LeRobotDataset("lerobot/aloha_static_coffee"). The exact features will change from dataset to dataset but not the main aspects:

dataset attributes:
  ├ hf_dataset: a Hugging Face dataset (backed by Arrow/parquet). Typical features example:
  │  ├ observation.images.cam_high (VideoFrame):
  │  │   VideoFrame = {'path': path to a mp4 video, 'timestamp' (float32): timestamp in the video}
  │  ├ observation.state (list of float32): position of an arm joints (for instance)
  │  ... (more observations)
  │  ├ action (list of float32): goal position of an arm joints (for instance)
  │  ├ episode_index (int64): index of the episode for this sample
  │  ├ frame_index (int64): index of the frame for this sample in the episode ; starts at 0 for each episode
  │  ├ timestamp (float32): timestamp in the episode
  │  ├ next.done (bool): indicates the end of an episode ; True for the last frame in each episode
  │  └ index (int64): general index in the whole dataset
  ├ episode_data_index: contains 2 tensors with the start and end indices of each episode
  │  ├ from (1D int64 tensor): first frame index for each episode — shape (num episodes,) starts with 0
  │  └ to: (1D int64 tensor): last frame index for each episode — shape (num episodes,)
  ├ stats: a dictionary of statistics (max, mean, min, std) for each feature in the dataset, for instance
  │  ├ observation.images.cam_high: {'max': tensor with same number of dimensions (e.g. `(c, 1, 1)` for images, `(c,)` for states), etc.}
  │  ...
  ├ info: a dictionary of metadata on the dataset
  │  ├ codebase_version (str): this is to keep track of the codebase version the dataset was created with
  │  ├ fps (float): frame per second the dataset is recorded/synchronized to
  │  ├ video (bool): indicates if frames are encoded in mp4 video files to save space or stored as png files
  │  └ encoding (dict): if video, this documents the main options that were used with ffmpeg to encode the videos
  ├ videos_dir (Path): where the mp4 videos or png images are stored/accessed
  └ camera_keys (list of string): the keys to access camera features in the item returned by the dataset (e.g. `["observation.images.cam_high", ...]`)

A LeRobotDataset is serialised using several widespread file formats for each of its parts, namely:

hf_dataset stored using Hugging Face datasets library serialization to parquet
videos are stored in mp4 format to save space
metadata are stored in plain json/jsonl files

Dataset can be uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can specify its location with the root argument if it's not in the default ~/.cache/huggingface/lerobot location.

Evaluate a pretrained policy

Check out example 2 that illustrates how to download a pretrained policy from Hugging Face hub, and run an evaluation on its corresponding environment.

We also provide a more capable script to parallelize the evaluation over multiple environments during the same rollout. Here is an example with a pretrained model hosted on lerobot/diffusion_pusht:

python -m lerobot.scripts.eval \
    --policy.path=lerobot/diffusion_pusht \
    --env.type=pusht \
    --eval.batch_size=10 \
    --eval.n_episodes=10 \
    --policy.use_amp=false \
    --policy.device=cuda

Note: After training your own policy, you can re-evaluate the checkpoints with:

python -m lerobot.scripts.eval --policy.path={OUTPUT_DIR}/checkpoints/last/pretrained_model

See python -m lerobot.scripts.eval --help for more instructions.

Train your own policy

Check out example 3 that illustrates how to train a model using our core library in python, and example 4 that shows how to use our training script from command line.

To use wandb for logging training and evaluation curves, make sure you've run wandb login as a one-time setup step. Then, when running the training command above, enable WandB in the configuration by adding --wandb.enable=true.

A link to the wandb logs for the run will also show up in yellow in your terminal. Here is an example of what they look like in your browser. Please also check here for the explanation of some commonly used metrics in logs.

Note: For efficiency, during training every checkpoint is evaluated on a low number of episodes. You may use --eval.n_episodes=500 to evaluate on more episodes than the default. Or, after training, you may want to re-evaluate your best checkpoints on more episodes or change the evaluation settings. See python -m lerobot.scripts.eval --help for more instructions.

Reproduce state-of-the-art (SOTA)

We provide some pretrained policies on our hub page that can achieve state-of-the-art performances. You can reproduce their training by loading the config from their run. Simply running:

python -m lerobot.scripts.train --config_path=lerobot/diffusion_pusht

reproduces SOTA results for Diffusion Policy on the PushT task.

Contribute

If you would like to contribute to 🤗 LeRobot, please check out our contribution guide.

Add a pretrained policy

Once you have trained a policy you may upload it to the Hugging Face hub using a hub id that looks like ${hf_user}/${repo_name} (e.g. lerobot/diffusion_pusht).

You first need to find the checkpoint folder located inside your experiment directory (e.g. outputs/train/2024-05-05/20-21-12_aloha_act_default/checkpoints/002500). Within that there is a pretrained_model directory which should contain:

config.json: A serialized version of the policy configuration (following the policy's dataclass config).
model.safetensors: A set of torch.nn.Module parameters, saved in Hugging Face Safetensors format.
train_config.json: A consolidated configuration containing all parameters used for training. The policy configuration should match config.json exactly. This is useful for anyone who wants to evaluate your policy or for reproducibility.

To upload these to the hub, run the following:

huggingface-cli upload ${hf_user}/${repo_name} path/to/pretrained_model

See eval.py for an example of how other people may use your policy.

Improve your code with profiling

An example of a code snippet to profile the evaluation of a policy:

from torch.profiler import profile, record_function, ProfilerActivity

def trace_handler(prof):
    prof.export_chrome_trace(f"tmp/trace_schedule_{prof.step_num}.json")

with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
    schedule=torch.profiler.schedule(
        wait=2,
        warmup=2,
        active=3,
    ),
    on_trace_ready=trace_handler
) as prof:
    with record_function("eval_policy"):
        for i in range(num_episodes):
            prof.step()
            # insert code to profile, potentially whole body of eval_policy function

Citation

If you want, you can cite this work with:

@misc{cadene2024lerobot,
    author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascale, Caroline and Choghari, Jade and Moss, Jess and Wolf, Thomas},
    title = {LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
    howpublished = "\url{https://github.com/huggingface/lerobot}",
    year = {2024}
}

Additionally, if you are using any of the particular policy architecture, pretrained models, or datasets, it is recommended to cite the original authors of the work as they appear below:

SmolVLA

@article{shukor2025smolvla,
  title={SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics},
  author={Shukor, Mustafa and Aubakirova, Dana and Capuano, Francesco and Kooijmans, Pepijn and Palma, Steven and Zouitine, Adil and Aractingi, Michel and Pascal, Caroline and Russi, Martino and Marafioti, Andres and Alibert, Simon and Cord, Matthieu and Wolf, Thomas and Cadene, Remi},
  journal={arXiv preprint arXiv:2506.01844},
  year={2025}
}

Diffusion Policy

@article{chi2024diffusionpolicy,
	author = {Cheng Chi and Zhenjia Xu and Siyuan Feng and Eric Cousineau and Yilun Du and Benjamin Burchfiel and Russ Tedrake and Shuran Song},
	title ={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},
	journal = {The International Journal of Robotics Research},
	year = {2024},
}

ACT or ALOHA

@article{zhao2023learning,
  title={Learning fine-grained bimanual manipulation with low-cost hardware},
  author={Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
  journal={arXiv preprint arXiv:2304.13705},
  year={2023}
}

TDMPC

@inproceedings{Hansen2022tdmpc,
	title={Temporal Difference Learning for Model Predictive Control},
	author={Nicklas Hansen and Xiaolong Wang and Hao Su},
	booktitle={ICML},
	year={2022}
}

VQ-BeT

@article{lee2024behavior,
  title={Behavior generation with latent actions},
  author={Lee, Seungjae and Wang, Yibin and Etukuru, Haritheja and Kim, H Jin and Shafiullah, Nur Muhammad Mahi and Pinto, Lerrel},
  journal={arXiv preprint arXiv:2403.03181},
  year={2024}
}

HIL-SERL

@Article{luo2024hilserl,
title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
author={Jianlan Luo and Charles Xu and Jeffrey Wu and Sergey Levine},
year={2024},
eprint={2410.21845},
archivePrefix={arXiv},
primaryClass={cs.RO}
}

Name		Name	Last commit message	Last commit date
Latest commit History 921 Commits
.cache/calibration/so101		.cache/calibration/so101
.github		.github
benchmarks/video		benchmarks/video
docker		docker
docs		docs
examples		examples
hand_tracking		hand_tracking
lerobot		lerobot
media		media
src/lerobot		src/lerobot
tests		tests
~/.cache/huggingface/lerobot		~/.cache/huggingface/lerobot
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
IMPLEMENTATION_ROADMAP.md		IMPLEMENTATION_ROADMAP.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
SO101_SPRINT_SUMMARY.md		SO101_SPRINT_SUMMARY.md
URGENT_HARDWARE_SPRINT.md		URGENT_HARDWARE_SPRINT.md
human2robot.code-workspace		human2robot.code-workspace
pyproject.toml		pyproject.toml
so101_hardware_tests.py		so101_hardware_tests.py

License

7jep7/human2robot

Folders and files

Latest commit

History

Repository files navigation