Tianchen Deng1 · Xun Chen2 · Ziming Li1 · Hongming Shen2 · Danwei Wang2 · Javier Civera3 · Hesheng Wang1
1Shanghai Jiao Tong University· 2Nanyang Technological University· 3University of Zaragoza
UniPR-3D is a universal visual place recognition framework that supports both frame-to-frame and sequence-to-sequence matching. Our model is capable of predicting visual descriptors for both individual frames and entire sequences. It leverages 3D and 2D tokens with tailored aggregation strategies for robust single-frame and variable-length sequence matching, achieving state-of-the-art performance on benchmarks like MSLS, Pittsburgh, NordLand, and SPED.
The code has been tested on:
- Ubuntu 22.04 LTS, Python 3.11.10, CUDA 12.1, GeForce RTX 4090
Clone the repo:
git clone https://github.com/dtc111111/UniPR-3D.git
cd UniPR-3D
We provide a docker file for easy setup. To build the docker image, run:
docker build -t unipr3d -f DOCKERFILE .
To run the docker container, use:
docker run --gpus all -it unipr3d /bin/bash
You may need to mount your data directory to access the datasets, e.g., add -v /path/to/your/data:/data to the above command.
To achieve higher performance, we separately train single-frame and multi-frame models. You may download our pretrained models from hugging face or from release and place them anywhere you like.
If have to download following datasets to evaluate our method or reproduce our results.
For training: We use GSV-Cities (github repo) dataset for training our single-frame model and Mapillary (MSLS) (github repo) dataset for training our multi-frame model.
For evaluation:
- Single frame evaluation:
- MSLS Challenge, where you upload your predictions to their server for evaluation.
- Single-frame MSLS Validation set
- Nordland dataset, Pittsburgh dataset and SPED dataset, you may download them from here, aligned with DINOv2 SALAD.
- Multi-frame evaluation:
- Multi-frame MSLS Validation set
- Two sequence from Oxford RobotCar, you may download them here.
- 2014-12-16-18-44-24 (winter night) query to 2014-11-18-13-20-12 (fall day) db
- 2014-11-14-16-34-33 (fall night) query to 2015-11-13-10-28-08 (fall day) db
- Nordland (filtered) dataset
Before training or evaluation, please download the dataset and replace the paths with your own paths in /dataloaders/*
To reproduce our results and train the model, run:
# For single-frame model training
python3 main_ft.py
# For multi-frame model training
python3 main_lora_multiframe.py
Make sure to set the correct paths in the python file before running.
To evaluate the model on datasets mentioned above, run:
# For both single frame and multi-frame evaluation
python3 eval_lora.py
Make sure to set the correct paths in the python file before running. If you are evaluating directly based on our pretrained models, you may need to set the path to the pretrained model in the python file as well.
Our method achieves significantly higher recall than competing approaches, achieving new state-of-the-art performance on both single and multiple frame benchmarks.
<style> table, th, td { border-collapse: collapse; text-align: center; } </style>| MSLS Challenge | MSLS Val | NordLand | Pitts250k-test | SPED | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | Latency (ms) | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 | R@1 | R@5 |
| MixVPR | 1.37 | 64.0 | 75.9 | 88.0 | 92.7 | 58.4 | 74.6 | 94.6 | 98.3 | 85.2 | 92.1 |
| EigenPlaces | 2.65 | 67.4 | 77.1 | 89.3 | 93.7 | 54.4 | 68.8 | 94.1 | 98.0 | 69.9 | 82.9 |
| DINOv2 SALAD | 2.41 | 73.0 | 86.8 | 91.2 | 95.3 | 69.6 | 84.4 | 94.5 | 98.7 | 89.5 | 94.4 |
| UniPR-3D (ours) | 8.23 | 74.3 | 87.5 | 91.4 | 96.0 | 76.2 | 87.3 | 94.9 | 98.1 | 89.6 | 94.5 |
| MSLS Val | NordLand | Oxford1 | Oxford2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | R@1 | R@5 | R@10 |
| SeqMatchNet | 65.5 | 77.5 | 80.3 | 56.1 | 71.4 | 76.9 | 36.8 | 43.3 | 48.3 | 27.9 | 38.5 | 45.3 |
| SeqVLAD | 89.9 | 92.4 | 94.1 | 65.5 | 75.2 | 80.0 | 58.4 | 72.8 | 80.8 | 19.1 | 29.9 | 37.3 |
| CaseVPR | 91.2 | 94.1 | 95.0 | 84.1 | 89.9 | 92.2 | 90.5 | 95.2 | 96.5 | 72.8 | 85.8 | 89.9 |
| UniPR-3D (ours) | 93.7 | 95.7 | 96.9 | 86.8 | 91.7 | 93.8 | 95.4 | 98.1 | 98.7 | 80.6 | 90.3 | 93.9 |
If you have any questions regarding this project, please contact Tianchen Deng (dengtianchen@sjtu.edu.cn). If you want to use our intermediate results for qualitative comparisons, please reach out to the same email.
Our implementation is heavily based on SALAD and VGGT. We thank the authors for their open-source contributions. If you use the code that is based on their contribution, please cite them as well.
If you find our paper and code useful, please cite us:
@inproceedings{deng2026_unipr3d,
title = {UniPR-3D: Towards Universal Visual Place Recognition with 3D Visual Geometry Grounded Transformer},
author = {Tianchen Deng and Xun Chen and Ziming Li and Hongming Shen and Danwei Wang and Javier Civera and Hesheng Wang},
booktitle = {Arxiv},
year = {2026},
}