Yuxiang Ji1,2
Yong Wang2†
Ziyu Ma2
Yiming Hu2
Hailang Huang2
Xuecai Hu2
Guanhua Chen3
Liaoni Wu1
Xiangxiang Chu2
1Xiamen University
2AMAP, Alibaba Group
3Southern University of Science and Technology
†Project lead
Note
This project includes the codebase, datasets and chckpoints for Thinking with Map: a map-augmented agent for geolocalization. Given an image in-the-wild, the agent can conduct reasoning with map to inference the location.
The illustration of a complete Thinking with Map process.
Comparison with open- and closed-source models.
- [Feb 3, 2026]: 🛠️ Our code and data are realeased now.
- [Jan 12, 2026]: 🔥 We are honored to be featured as HuggingFace Daily Paper #1.
- [Jan 12, 2026]: 📍 Our paper is released on ArXiv and HuggingFace.
We release two versions of the dataset.
- V1 contains the training and test data used in the paper (lower resolution, will be deperecated).
- We also provide V2 with higher-resolution images (recommended): the released training set includes the subset of 6k samples, and the test set is the same size as V1 (~2.5k samples).
| V1 (Low Resolution, Deperecated) | V2 (High Resolution, Recommended) |
|---|---|
| 🤗HuggingFace | 🤗HuggingFace |
| Qwen3-VL-30B-A3B RL-tuned on MAPBench-V2 37k |
|---|
| 🤗HuggingFace (Released soon) |
After downloading the dataset, process it into the parquet format.
cd verl/examples/data_preprocess
bash preprocess_thinking_with_map.shIf you just want to try the demo without training/evaluating, you can skip the installation part and try Inference part directly.
Please refer to VeRL Installation for more details.
## VeRL Installation
# use conda for environment management
conda create -n mapagent python==3.12
conda activate mapagent
# install torch and vllm
pip install torch==2.8.0
pip install vllm==0.11.0
# install sglang and other basic packages
cd Thinking-with-Map/verl/scripts
bash install_vllm_sglang_mcore.sh
# install verl
cd Thinking-with-Map/verl
pip install --no-deps -e .The tool server can share the same environment with VeRL.
## Tool Sever Installation
pip install playwright
pip install "uvicorn[standard]"
pip install json5
pip install fastapiAfter downloading the model, employ it by vllm server.
# at least 2 GPUs with 80GB memory each for the trained Qwen3-VL-30B-A3B model
# try more GPUs when OOM
vllm serve /path/to/released/model \
--tensor-parallel-size 2 \
--port 8002Then try the demo with demo/cookbook_thinking_with_map.ipynb.
Start the tool server with cache on each cluster node.
cd verl/tool_server
# need redis server running on port 6397 for cache
# change your own api key in the bash script
bash run_api_server.sh $RANKSimply run distributed RL training on each cluster node.
cd verl/geoagent_scripts
# training on MAPBench
bash train_thinking_with_map.shThe default training is conduct on 8 nodes with 8 GPUs each, while you can change it by modifying N_NODES, train_batch_size and ppo_mini_batch_size respectively to adapt to your own environment.
The evaluation code is also conducted on VeRL.
cd verl/geoagent_scripts
bash test_thinking_with_map.shThe repo is built upon VeRL, SGLang, vLLM, and Qwen-Agent. We appreciate these open-source communities for their great work.
@article{ji2026thinking,
title={Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization},
author={Ji, Yuxiang and Wang, Yong and Ma, Ziyu and Hu, Yiming and Huang, Hailang and Hu, Xuecai and Chen, Guanhua and Wu, Liaoni and Chu, Xiangxiang},
journal={arXiv preprint arXiv:2601.05432},
year={2026}
}
