Skip to content

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization

License

Notifications You must be signed in to change notification settings

AMAP-ML/Thinking-with-Map

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization

Yuxiang Ji1,2  Yong Wang2†  Ziyu Ma2  Yiming Hu2  Hailang Huang2 
Xuecai Hu2  Guanhua Chen3  Liaoni Wu1  Xiangxiang Chu2 
1Xiamen University    2AMAP, Alibaba Group    3Southern University of Science and Technology
Project lead    

Page | Arxiv | 🤗 Paper | 🤗 Data | 🤗 Model

Data | Model | X@AK

Note

This project includes the codebase, datasets and chckpoints for Thinking with Map: a map-augmented agent for geolocalization. Given an image in-the-wild, the agent can conduct reasoning with map to inference the location.

🎬 Demo

Demo of Thinking with Map

demo The illustration of a complete Thinking with Map process.

demo Comparison with open- and closed-source models.

News

  • [Feb 3, 2026]: 🛠️ Our code and data are realeased now.
  • [Jan 12, 2026]: 🔥 We are honored to be featured as HuggingFace Daily Paper #1.
  • [Jan 12, 2026]: 📍 Our paper is released on ArXiv and HuggingFace.

Table of contents

💾 Dataset Access

We release two versions of the dataset.

  • V1 contains the training and test data used in the paper (lower resolution, will be deperecated).
  • We also provide V2 with higher-resolution images (recommended): the released training set includes the subset of 6k samples, and the test set is the same size as V1 (~2.5k samples).
V1 (Low Resolution, Deperecated) V2 (High Resolution, Recommended)
🤗HuggingFace 🤗HuggingFace
ModelScope ModelScope

📦 Model Zoo

Qwen3-VL-30B-A3B RL-tuned on MAPBench-V2 37k
🤗HuggingFace (Released soon)
ModelScope (Released soon)

🚀 Quick Start

After downloading the dataset, process it into the parquet format.

cd verl/examples/data_preprocess
bash preprocess_thinking_with_map.sh

Installation

If you just want to try the demo without training/evaluating, you can skip the installation part and try Inference part directly.

Please refer to VeRL Installation for more details.

## VeRL Installation 
# use conda for environment management
conda create -n mapagent python==3.12
conda activate mapagent
# install torch and vllm
pip install torch==2.8.0
pip install vllm==0.11.0
# install sglang and other basic packages
cd Thinking-with-Map/verl/scripts
bash install_vllm_sglang_mcore.sh
# install verl
cd Thinking-with-Map/verl
pip install --no-deps -e .

The tool server can share the same environment with VeRL.

## Tool Sever Installation
pip install playwright
pip install "uvicorn[standard]"
pip install json5
pip install fastapi

Inference

After downloading the model, employ it by vllm server.

# at least 2 GPUs with 80GB memory each for the trained Qwen3-VL-30B-A3B model
# try more GPUs when OOM
vllm serve /path/to/released/model \
    --tensor-parallel-size 2 \
    --port 8002

Then try the demo with demo/cookbook_thinking_with_map.ipynb.

Training

Start the tool server with cache on each cluster node.

cd verl/tool_server
# need redis server running on port 6397 for cache
# change your own api key in the bash script
bash run_api_server.sh $RANK

Simply run distributed RL training on each cluster node.

cd verl/geoagent_scripts
# training on MAPBench
bash train_thinking_with_map.sh

The default training is conduct on 8 nodes with 8 GPUs each, while you can change it by modifying N_NODES, train_batch_size and ppo_mini_batch_size respectively to adapt to your own environment.

Evaluation

The evaluation code is also conducted on VeRL.

cd verl/geoagent_scripts
bash test_thinking_with_map.sh

🙏 Acknowledgement

The repo is built upon VeRL, SGLang, vLLM, and Qwen-Agent. We appreciate these open-source communities for their great work.

📌 Citation

@article{ji2026thinking,
  title={Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization},
  author={Ji, Yuxiang and Wang, Yong and Ma, Ziyu and Hu, Yiming and Huang, Hailang and Hu, Xuecai and Chen, Guanhua and Wu, Liaoni and Chu, Xiangxiang},
  journal={arXiv preprint arXiv:2601.05432},
  year={2026}
}

About

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages