SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning

Official PyTorch implementation of
“SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models.”
Shun Taguchi, Hideki Deguchi, Takumi Hamazaki, Hiroyuki Sakai

Overview

SpatialPrompting tackles zero-shot spatial question answering in 3D scenes by

extracting representative keyframes based on spatial and semantic features, and
constructing LLM prompts that embed spatial context without any 3D-specific fine-tuning.

This repository contains:

Feature Extraction – extract_features.py
Interactive Spatial QA – spatialqa.py
Benchmark Inference – predict_scanqa.py, predict_sqa3d.py
Evaluation – score_scanqa.py, score_sqa3d.py

Installation

1. Clone repo

git clone https://github.com/ToyotaCRDL/SpatialPrompting.git
cd SpatialPrompting

2. Create environment (conda example)

conda create -n spatialprompting python=3.10 -y
conda activate spatialprompting

3. Install PyTorch and other dependencies

Install PyTorch (see https://pytorch.org), and other dependencies:

# CUDA 11.8 build
pip install torch==2.5.0+cu118 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt

The project is tested on Ubuntu 22.04 + Python 3.10 + CUDA 11.8 + PyTorch 2.5.0.

4. Set API keys:

export OPENAI_API_KEY="your_openai_key"
export GOOGLE_API_KEY="your_gemini_key"

Data Preparation

/path/to/your/data
└── data
    ├── ScanNet
    ├── ScanQA
    └── SQA3D

Please extract .sens files of the ScanNet.
When running the scripts, specify the base path using the --base_path argument.

Quick Start

1. Extract Spatial Features

python extract_features.py \
  --base_path /path/to/your/data \
  --dataset scannet \
  --env scene0050_00 \
  --model vitl336

2. Interactive Spatial QA

python spatialqa.py \
  --llm gpt-4o-2024-11-20 \
  --feature /path/to/spatial_feature.npz \
  --image_num 30

3. Predict & Evaluate ScanQA & SQA3D Dataset

ScanQA

Predict:

python predict_scanqa.py \
  --base_path /path/to/your/data \
  --llm gpt-4o-2024-11-20 \
  --model vitl336 \
  --image_num 30

Evaluate:

python score_scanqa.py \
  --base_path /path/to/your/data \
  --pred /path/to/prediction.jsonl \
  --use_spice # optional

SQA3D

Predict:

python predict_sqa3d.py \
  --base_path /path/to/your/data \
  --llm gpt-4o-2024-11-20 \
  --model vitl336 \
  --image_num 30

Evaluate:

python score_sqa3d.py \
  --base_path /path/to/your/data \
  --pred /path/to/prediction.jsonl

Citation

If you find this project useful in your research, please consider citing:

@article{taguchi2025spatialprompting,
  title={SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models},
  author={Taguchi, Shun and Deguchi, Hideki and Hamazaki, Takumi and Sakai, Hiroyuki},
  journal={arXiv preprint arXiv:2505.04911},
  year={2025}
}

License

This code is released for non-commercial research use only.
See the full text in the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning

Table of Contents

Overview

Installation

1. Clone repo

2. Create environment (conda example)

3. Install PyTorch and other dependencies

4. Set API keys:

Data Preparation

Quick Start

1. Extract Spatial Features

2. Interactive Spatial QA

3. Predict & Evaluate ScanQA & SQA3D Dataset

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
spatial_feature		spatial_feature
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extract_features.py		extract_features.py
predict_scanqa.py		predict_scanqa.py
predict_sqa3d.py		predict_sqa3d.py
requirement.txt		requirement.txt
score_scanqa.py		score_scanqa.py
score_sqa3d.py		score_sqa3d.py
spatialqa.py		spatialqa.py

License

ToyotaCRDL/SpatialPrompting

Folders and files

Latest commit

History

Repository files navigation

SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning

Table of Contents

Overview

Installation

1. Clone repo

2. Create environment (conda example)

3. Install PyTorch and other dependencies

4. Set API keys:

Data Preparation

Quick Start

1. Extract Spatial Features

2. Interactive Spatial QA

3. Predict & Evaluate ScanQA & SQA3D Dataset

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages