This repository is the code base for the (rejected) IROS 2024 submission: Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks and a revised RA-L submission (Not yet published) by Jonathan Salfity, Selma Wanna, Minkyu Choi, and Mitch Pryor. The corresponding author is Jonathan Salfity (j [dot] salfity [at] utexas [dot] edu).
The code base is divided into the following sections:
- Data Generation through Robosuite simulations and Finite State machine (FSM) implementation is in (
scripts/). The data is stored indata/as.txtand/or.mp4files upon generation, depending on the config file. For data used in the original paper submission, contact j [dot] salfity [at] utexas [dot] edu for access. - Querying a Foundation Model (FM) for sub-task decomposition is in
(analysis/query_LLM.py) - Analysis of the FM output, comparison with groundtruth data, comparison with human annotations, plot and table generation is in
(analysis/main_metrics_calculations.ipynb) - Human annotation data is in
output/. - The main metrics (temporal and semantic) calculations are in
analysis/comparisons.py, specifically theget_subtask_similarityfunction.
Supporting functions including API call, prompt building, in-context learning examples, and random baseline implementation are found in /utils.
Install this package
pip install -e .Download the mujoco binaries from here.
Place in ~/.mujoco/mujoco<>/ folder. Install mujoco via pip
pip install mujocoInstall robosuite:
pip install robosuiteConfigure which environment to run in the /scripts/demo_config.yaml.
Currently we have 4 environments: "Stack", "Lift", "Door", "PickPlace".
Follow the uncommented lines in /scripts/demo_config.yaml to set the correct fields.
Run the data generation script
python scripts/run_demo.py(This data is not used in the IROS paper)
Go to the robomimic site to download data: https://robomimic.github.io/docs/datasets/robomimic_v0.1.html.
(Note that this currently only seems to work with Safari broswer).
Place the downloaded hdf5 files in the respective data/robomimic folder.
Run the data generation script /scripts/record_robomimic_data.py with command line args that specificy the path to the demo_v141.hdf5 file, the number of demos to run, and whether to save_txt or save_video.
The script will automatically extract the specific env_name and place the text and videos in the respective data/txt or data/video folders.
Example:
python scripts/record_robomimic_data.py --dataset path/to/robomimic/demo_v141.hdf5 --num_demos 1 --save_txt 1 --save_video 1Assuming you have set up OpenAI and generativeai python packages and set the API keys as environment variables, i.e. OPENAI_API_KEY and GOOGLE_API_KEY.
The configuration file for the LLM is in config/query_LLM_config.yaml.
The following are options for the FM model:
gpt-4-vision-previewgpt-4-1106-previewgemini-progemini-pro-vision(Not in this repo, called via Google Cloud Vertix AI API)
All states in each environment are in R^3 and represent the x-y-z position of the object in the environment. All actions are in R^7, using the Robosuites OSC_POSE controller.
The following are options for the environment:
Door:- States:
robot0_eef_pos,door_pos,handle_pos,door_to_eef_pos,handle_to_eef_pos.
- States:
Lift- States:
robot0_eef_pos,cube_pos,gripper_to_cube.
- States:
PickPlace- States:
robot0_eef_pos,Can_pos,Can_to_robot0_eef_pos.
- States:
Stack- States:
robot0_eef_pos,cubeA_pos,cubeB_pos,gripper_to_cubeA,gripper_to_cubeB,cubeA_to_cubeB.
- States:
The following are options for input modalities and in-context learning examples to include in the LLM prompt query, which can be used in combination with each other:
textual_input: (True or False)video_input: (True or False)in_context: (True or False)
To run the LLM, run the following command:
python analysis/query_LLM.pySee analysis/main_metrics_calculations.ipynb to generate plots show in the paper.