GitHub

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains

🔍 About GRE Suite

GRE Suite is designed to augment VLMs with structured reasoning chains for accurate and interpretable location inference. It consists of three primary components:

Dataset (GRE30K)

GRE30K is a geo-localization reasoning dataset designed to enhance the visual reasoning capability of MLLMs. Specifically, GRE30K consists of GRE30K-CoT for cold-start Initialization and GRE30K-Judge for reinforcement learning.

Model (GRE)

GRE is an effective Reasoning MLLM, which employs a multi-stage reasoning strategy to progressively infer scene attributes, local details, and semantic features, thereby narrowing down potential geographic regions with enhanced precision.

Benchmark (GREval-Bench)

GREval-Bench is a geographical reasoning benchmark that employs a semi-automated pipeline to curate geographically informative images with explicit and implicit indicators, and provides annotated Chain-of-Thought steps and reference GPS coordinates for comprehensive evaluation of models' geo-localization capabilities.

🛠️ Requirements and Installation

Basic Dependencies:

Python >= 3.8
Pytorch >= 2.5.0
CUDA Version >= 11.8
transformers == 4.40.0
tokenizers == 0.19.1

git clone https://github.com/Thorin215/GRE.git
cd GRE
conda create -n GRE python=3.10
conda activate GRE
bash environment.sh

🌟 Getting started

Step1: download GRE-7b and set model_name_or_path in infer.ipynb to the path of GRE-7b.

Step2: refer to the examples in infer.ipynb for detailed instructions on how to use our model for image geo-localization.

🚀 Main Results

We perform a comparative analysis of GRE against worldwide Geo-Localization benchmarks, Im2GPS3k and GWS15k. In all metrics, our method surpasses the previous state-of-the-art model on Im2GPS3k, achieving improvements of +0.5%, +4.2%, +3.0%, +1.7% and +2.5% in the 1km, 25km, 200km, 750km, and 2500km thresholds respectively.

We compare our approach on GREval-Bench with the previous generalist models, including InternVL2.5 series, InternVL3 series, Qwen2.5-VL series. We conduct comprehensive evaluations of models, analyzing the above metric across different distance thresholds and scenarios, while also assessing the quality of its reasoning chains.

🗝️ Training & Evaluation

Training

The all datasets for training can be found in Dataset preparation.

The training pipeline of our model is structured into three distinct stages.

Stage1: Cold-start Initialization
- Download Qwen2.5-VL-7B-Instruct
- Set model_name_or_path in stage1.sh to the path of Qwen2.5-VL-7B-Instruct.
- Prepare GRE-30K for cold-start initialization.
- Run bash scripts/train/stage1.sh.
Stage2: RL stage I
- Set model_name_or_path in stage2.sh to the path of stage1 checkpoint.
- Prepare datasets used for stage2.
- Run bash scripts/train/stage2.sh.
Stage3: RL stage II
- Set model_name_or_path in stage3.sh to the path of stage2 checkpoint.
- Prepare datasets used for stage3.
- Run bash scripts/train/stage3.sh.

Evaluation

For model evaluation, please refer to eval.

📰 Coming Soon

Release the code of GRE
Release the GRE30K dataset on huggingface.
Release the checkpoint.
Release the GREval-Bench on huggingface.
Release the evaluation outputs.

🌏 Checkpoints

Model Name	Base Model	# Training Epochs

🖨️ GRE30K

The dataset can be accessed on 🤗dataset.

GRE30K-CoT Data format:

[
    {
        "image": "images/xxx.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\n{CoT Instruction}?"
            },
            {
                "from": "gpt",
                "value": "..."
            }
        ],
        "gt_lat": {gt_lat},
        "gt_lon": {gt_lon}
    },
    ...
]

GRE30K-Judge Data format:

[
    {
        "image": "images/xxx.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\n{Judge Instruction}?"
            },
            {
                "from": "gpt",
                "value": "True/False"
            }
        ],
        "predicted_cot": "{predicted_cot}",
        "predicted_answer": "{predicted_answer}",
        "gt_lat": {gt_lat},
        "gt_lon": {gt_lon}
    },
    ...
]

GRE30K-Seed Data format:

[
    {
        "image": "images/xxx.jpg",
        "instructions": "{Seed Instruction}",
        "gt_lat": {gt_lat},
        "gt_lon": {gt_lon}
    },
    ...
]

🕹️ GREval-Bench

GREval-Bench assesses the models in two key areas: localization performance and Chain-of-Thought quality.

The annotations of the benchmark can be found in 🤗benchmark.
The usage of GREval-Bench is detailed in doc.

📑 Citation

If you find GRE Suite useful for your research and applications, please cite using this BibTeX:

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
benchmark		benchmark
demo		demo
eval		eval
scripts/train		scripts/train
train		train
.gitmodules		.gitmodules
README.md		README.md
infer.py		infer.py
training.md		training.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains

🔍 About GRE Suite

🛠️ Requirements and Installation

🌟 Getting started

🚀 Main Results

🗝️ Training & Evaluation

Training

Evaluation

📰 Coming Soon

🌏 Checkpoints

🖨️ GRE30K

🕹️ GREval-Bench

📑 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Thorin215/GRE

Folders and files

Latest commit

History

Repository files navigation

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains

🔍 About GRE Suite

🛠️ Requirements and Installation

🌟 Getting started

🚀 Main Results

🗝️ Training & Evaluation

Training

Evaluation

📰 Coming Soon

🌏 Checkpoints

🖨️ GRE30K

🕹️ GREval-Bench

📑 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages