Skip to content

dtc111111/UniPR-3D

Repository files navigation

UniPR-3D: Towards Universal Visual Place Recognition with 3D Visual Geometry Grounded Transformer

Tianchen Deng1 · Xun Chen2 · Ziming Li1 · Hongming Shen2 · Danwei Wang2 · Javier Civera3 · Hesheng Wang1

1Shanghai Jiao Tong University· 2Nanyang Technological University· 3University of Zaragoza

📃 Description

UniPR-3D is a universal visual place recognition framework that supports both frame-to-frame and sequence-to-sequence matching. Our model is capable of predicting visual descriptors for both individual frames and entire sequences. It leverages 3D and 2D tokens with tailored aggregation strategies for robust single-frame and variable-length sequence matching, achieving state-of-the-art performance on benchmarks like MSLS, Pittsburgh, NordLand, and SPED.

🛠️ Setup

The code has been tested on:

  • Ubuntu 22.04 LTS, Python 3.11.10, CUDA 12.1, GeForce RTX 4090

📦 Repository

Clone the repo:

git clone https://github.com/dtc111111/UniPR-3D.git
cd UniPR-3D

💻 Installation

We provide a docker file for easy setup. To build the docker image, run:

docker build -t unipr3d -f DOCKERFILE .

To run the docker container, use:

docker run --gpus all -it unipr3d /bin/bash

You may need to mount your data directory to access the datasets, e.g., add -v /path/to/your/data:/data to the above command.

🚀 Usage

Downloading Pretrained Models

To achieve higher performance, we separately train single-frame and multi-frame models. You may download our pretrained models from hugging face or from release and place them anywhere you like.

Downloading the Datasets

If have to download following datasets to evaluate our method or reproduce our results.

For training: We use GSV-Cities (github repo) dataset for training our single-frame model and Mapillary (MSLS) (github repo) dataset for training our multi-frame model.

For evaluation:

  • Single frame evaluation:
    • MSLS Challenge, where you upload your predictions to their server for evaluation.
    • Single-frame MSLS Validation set
    • Nordland dataset, Pittsburgh dataset and SPED dataset, you may download them from here, aligned with DINOv2 SALAD.
  • Multi-frame evaluation:
    • Multi-frame MSLS Validation set
    • Two sequence from Oxford RobotCar, you may download them here.
      • 2014-12-16-18-44-24 (winter night) query to 2014-11-18-13-20-12 (fall day) db
      • 2014-11-14-16-34-33 (fall night) query to 2015-11-13-10-28-08 (fall day) db
    • Nordland (filtered) dataset

Before training or evaluation, please download the dataset and replace the paths with your own paths in /dataloaders/*

Training

To reproduce our results and train the model, run:

# For single-frame model training
python3 main_ft.py
# For multi-frame model training
python3 main_lora_multiframe.py

Make sure to set the correct paths in the python file before running.

Evaluating

To evaluate the model on datasets mentioned above, run:

# For both single frame and multi-frame evaluation
python3 eval_lora.py

Make sure to set the correct paths in the python file before running. If you are evaluating directly based on our pretrained models, you may need to set the path to the pretrained model in the python file as well.

Results

Our method achieves significantly higher recall than competing approaches, achieving new state-of-the-art performance on both single and multiple frame benchmarks.

Single-frame matching results

<style> table, th, td { border-collapse: collapse; text-align: center; } </style>
MSLS Challenge MSLS Val NordLand Pitts250k-test SPED
Method Latency (ms) R@1 R@5 R@1 R@5 R@1 R@5 R@1 R@5 R@1 R@5
MixVPR 1.37 64.0 75.9 88.0 92.7 58.4 74.6 94.6 98.3 85.2 92.1
EigenPlaces 2.65 67.4 77.1 89.3 93.7 54.4 68.8 94.1 98.0 69.9 82.9
DINOv2 SALAD 2.41 73.0 86.8 91.2 95.3 69.6 84.4 94.5 98.7 89.5 94.4
UniPR-3D (ours) 8.23 74.3 87.5 91.4 96.0 76.2 87.3 94.9 98.1 89.6 94.5

Sequence matching results

MSLS Val NordLand Oxford1 Oxford2
Method R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10
SeqMatchNet 65.5 77.5 80.3 56.1 71.4 76.9 36.8 43.3 48.3 27.9 38.5 45.3
SeqVLAD 89.9 92.4 94.1 65.5 75.2 80.0 58.4 72.8 80.8 19.1 29.9 37.3
CaseVPR 91.2 94.1 95.0 84.1 89.9 92.2 90.5 95.2 96.5 72.8 85.8 89.9
UniPR-3D (ours) 93.7 95.7 96.9 86.8 91.7 93.8 95.4 98.1 98.7 80.6 90.3 93.9

📧 Contact

If you have any questions regarding this project, please contact Tianchen Deng (dengtianchen@sjtu.edu.cn). If you want to use our intermediate results for qualitative comparisons, please reach out to the same email.

✏️ Acknowledgement

Our implementation is heavily based on SALAD and VGGT. We thank the authors for their open-source contributions. If you use the code that is based on their contribution, please cite them as well.

🎓 Citation

If you find our paper and code useful, please cite us:

@inproceedings{deng2026_unipr3d,
  title     = {UniPR-3D: Towards Universal Visual Place Recognition with 3D Visual Geometry Grounded Transformer},
  author    = {Tianchen Deng and Xun Chen and Ziming Li and Hongming Shen and Danwei Wang and Javier Civera and Hesheng Wang},
  booktitle = {Arxiv},
  year      = {2026},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors