Skip to content

menloresearch/procthor-agent

Repository files navigation

ProcThor Agent

ProcThor Agent Demo

Agent Architecture

The ProcThor Agent is designed to navigate within simulated environments using a Vision Language Model (VLM). The system employs a structured interaction loop where the model receives visual observations and executes specific tool calls.

Interaction Flow

The interaction follows a strict template where the User acts as the environment interface, providing:

  1. The Goal (e.g., "Visit all rooms").
  2. A history of recent actions and observations.
  3. The current visual observation (RGB image).

The Assistant (LLM) then analyzes the visual input and context to determine the next best action, outputting a structured function call.

Chat Template

Available Tools

The agent is equipped with a precise set of tools to manipulate its position and orientation. These tools are defined with specific arguments to ensure deterministic control over the agent's movement.

  • Navigation: Moves the agent in cardinal directions (Ahead, Back, Left, Right) with configurable magnitude (0.1 to 1.0).
  • Rotation: Rotates the view (Left, Right) by fixed degrees (15, 30, 45, 90).
  • Done: Signals the completion of the task.

LLM Tools

Setup

Prerequisites

  • Python 3.12

Step by step

  1. Run pip install -r requirements.txt
  2. Run cp .env.example .env to create environment file. then add api key into OPENAI_API_KEY=TOKEN

Getting Start

Human Control

python scripts/interactive_wasd.py

Agent Control

python scripts/ai_agent.py

Agent Control - Multple Actions in one response (WIP)

python scripts/ai_agent_chunked.py

Benchmark

Evaluate agent navigation performance on ProcTHOR environments.

Create Benchmark Dataset

python scripts/create_benchmark_dataset.py --num 50 --split test --output benchmark_dataset.jsonl

Run Benchmark

python scripts/run_benchmark.py --benchmark benchmark_dataset.jsonl --max_steps 70

Results saved to benchmark_results.jsonl. Trajectory visualizations saved to benchmark_visualizations/.

Analyze Benchmark Logs

Analyze execution logs, calculate metrics (redundancy, blocked actions), and generate annotated videos of agent performance.

python scripts/analyze_benchmark_logs.py benchmark_results/<benchmark_run_directory>

This generates result_detailed.json and videos in analysis_videos/ within the run directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages