Skip to content

MarionLepert/phantom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for Phantom and Masquerade

Python License: MIT


This repository contains the code used to process human videos in Phantom: Training Robots Without Robots Using Only Human Videos and Masquerade: Learning from In-the-wild Human Videos using Data-Editing.

Marion Lepert, Jiaying Fang, Jeannette Bohg

Phantom Teaser

Marion Lepert*, Jiaying Fang*, Jeannette Bohg

Masquerade Teaser

Both projects use data editing to convert human videos into “robotized” demonstrations. They share much of the same codebase, with some differences in the processing pipeline:

Phantom

  • Input: RGBD videos with a single left hand visible in every frame.
  • Data editing: inpaint the single human arm, overlay a rendered robot arm in the same pose.
  • Action labels: extract full 3D end-effector pose (position, orientation, gripper)

Masquerade

  • Input: RGB videos from Epic Kitchens; one or both hands may be visible, sometimes occluded.
  • Data editing: segment and inpaint both arms, overlay a bimanual robot whose effectors follow the estimated poses (with a 3-4cm error along the depth direction due to lack of depth data)
  • Action labels: use 2D projected waypoints as auxiliary supervision only (not full 3D actions)

Installation

  1. Clone this repo recursively
git clone --recursive git@github.com:MarionLepert/phantom.git
  1. Run the following script from the root directory to install the required conda environment.
./install.sh
  1. Download the MANO hand models. To do so, go to the MANO website and register to be able to download the models. Download the left and right hand models and move MANO_LEFT.pkl and MANO_RIGHT.pkl inside the $ROOT_DIR/submodules/phantom-hamer/_DATA/data/mano/ folder.

Getting Started

Process Phantom sample data (manually collected in-lab videos)

conda activate phantom

python process_data.py demo_name=pick_and_place data_root_dir=../data/raw processed_data_root_dir=../data/processed mode=all

Process Masquerade sample data (Epic Kitchens video)

conda activate phantom

python process_data.py demo_name=epic data_root_dir=../data/raw processed_data_root_dir=../data/processed mode=all --config-name=epic

Codebase Overview

Process data

Each video is processed using the following steps:

  1. Extract human hand bounding boxes: bbox_processor.py

    • mode=bbox
  2. Extract 2d human hand poses: hand_processor.py

    • mode=hand2d: extract the 2d hand pose
  3. Extract human and arm segmentation masks: segmentation_processor.py

    • mode=hand_segmentation: used for depth alignment in hand pose refinement (only works for hand3d)
    • mode=arm_segmentation: needed in all cases to inpaint the human
  4. Extract 3d human hand poses: hand_processor.py

    • mode=hand3d: extract the 3d hand pose (note: requires depth, and was only tested on the left hand)
  5. Retarget human actions to robot actions: action_processor.py

    • mode=action
  6. Smooth human poses: smoothing_processor.py

    • mode=smoothing
  7. Remove hand from videos using inpainting: handinpaint_processor.py

    • mode=hand_inpaint
    • Inpainting method E2FGVI is used.
  8. Overlay virtual robot on video: robotinpaint_processor.py

    • mode=robot_inpaint: overlay a single robot (default) or bimanual (epic mode) robot on the image

Config reference (see configuration files in configs/)

Flag Type Required Choices Description
--demo_name str - Name of the demonstration/dataset to process
--mode str (multiple) bbox, hand2d, hand3d, hand_segmentation, arm_segmentation, action, smoothing, hand_inpaint, robot_inpaint, all Processing modes to run (can specify multiple with e.g. 'mode=[bbox,hand2d]')
--robot_name str Panda, Kinova3, UR5e, IIWA, Jaco Type of robot to use for overlays
--gripper_name str Robotiq85 Type of gripper to use
--data_root_dir str - Root directory containing raw video data
--processed_data_root_dir str - Root directory to save processed data
--epic bool - Use Epic-Kitchens dataset processing mode
--bimanual_setup str single_arm, shoulders Bimanual setup configuration to use (shoulders corresponds to the bimanual hardware configuration used in Masquerade)
--target_hand str left, right, both Which hand(s) to target for processing
--camera_intrinsics str - Path to camera intrinsics file
--camera_extrinsics str - Path to camera extrinsics file
--input_resolution int - Resolution of input videos
--output_resolution int - Resolution of output videos
--depth_for_overlay bool - Use depth information for overlays
--demo_num str - Process a single demo number instead of all demos
--debug_cameras str (multiple) - Additional camera names to include for debugging
--constrained_hand bool - Use constrained hand processing
--render bool - Render the robot overlay on the video

Note Please specify --bimanual_setup single_arm along with --target_hand left or --target_hand right if you are using single arm. For bimanual setups, use --bimanual_setup shoulders.

Camera details

  • Phantom: a Zed2 camera was used to capture the sample data at HD1080 resolution.
  • Masquerade: We used Epic-Kitchens videos and used the camera intrinsics provided in the dataset. To use videos captured with a different camera resolution, update the camera intrinsics and extrinsics files in $ROOT_DIR/phantom/camera/.

Train policy

After processing the video data, the edited data can be used to train a policy. The following files should be used:

  • Observations

    • Phantom Samples: extract RGB images from data/processed/pick_and_place/*/video_overlay_Panda_single_arm.mkv
    • Epic (In-the-wild Data) Samples: extract RGB images from data/processed/epic/*/video_overlay_Kinova3_shoulders.mkv
  • Actions

    • Phantom Samples: All data stored in data/processed/pick_and_place/*/inpaint_processor/training_data_single_arm.npz
    • Epic (In-the-wild Data) Samples: All data stored in data/processed/epic/*/inpaint_processor/training_data_shoulders.npz

In Phantom, Diffusion Policy was used for policy training.

Citation

@article{lepert2025phantomtrainingrobotsrobots,
        title={Phantom: Training Robots Without Robots Using Only Human Videos}, 
        author={Marion Lepert and Jiaying Fang and Jeannette Bohg},
        year={2025},
        eprint={2503.00779},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2503.00779}, 
  }
@misc{lepert2025masqueradelearninginthewildhuman,
      title={Masquerade: Learning from In-the-wild Human Videos using Data-Editing}, 
      author={Marion Lepert and Jiaying Fang and Jeannette Bohg},
      year={2025},
      eprint={2508.09976},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2508.09976}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published