🛠️ Setup

UniPR-3D: Towards Universal Visual Place Recognition with 3D Visual Geometry Grounded Transformer

Tianchen Deng¹ · Xun Chen² · Ziming Li¹ · Hongming Shen² · Danwei Wang² · Javier Civera³ · Hesheng Wang¹

¹Shanghai Jiao Tong University· ²Nanyang Technological University· ³University of Zaragoza

📃 Description

UniPR-3D is a universal visual place recognition framework that supports both frame-to-frame and sequence-to-sequence matching. Our model is capable of predicting visual descriptors for both individual frames and entire sequences. It leverages 3D and 2D tokens with tailored aggregation strategies for robust single-frame and variable-length sequence matching, achieving state-of-the-art performance on benchmarks like MSLS, Pittsburgh, NordLand, and SPED.

🛠️ Setup

The code has been tested on:

Ubuntu 22.04 LTS, Python 3.11.10, CUDA 12.1, GeForce RTX 4090

📦 Repository

Clone the repo:

git clone https://github.com/dtc111111/UniPR-3D.git
cd UniPR-3D

💻 Installation

We provide a docker file for easy setup. To build the docker image, run:

docker build -t unipr3d -f DOCKERFILE .

To run the docker container, use:

docker run --gpus all -it unipr3d /bin/bash

You may need to mount your data directory to access the datasets, e.g., add -v /path/to/your/data:/data to the above command.

🚀 Usage

Downloading Pretrained Models

To achieve higher performance, we separately train single-frame and multi-frame models. You may download our pretrained models from hugging face or from release and place them anywhere you like.

Downloading the Datasets

If have to download following datasets to evaluate our method or reproduce our results.

For training: We use GSV-Cities (github repo) dataset for training our single-frame model and Mapillary (MSLS) (github repo) dataset for training our multi-frame model.

For evaluation:

Single frame evaluation:
- MSLS Challenge, where you upload your predictions to their server for evaluation.
- Single-frame MSLS Validation set
- Nordland dataset, Pittsburgh dataset and SPED dataset, you may download them from here, aligned with DINOv2 SALAD.
Multi-frame evaluation:
- Multi-frame MSLS Validation set
- Two sequence from Oxford RobotCar, you may download them here.
  - 2014-12-16-18-44-24 (winter night) query to 2014-11-18-13-20-12 (fall day) db
  - 2014-11-14-16-34-33 (fall night) query to 2015-11-13-10-28-08 (fall day) db
- Nordland (filtered) dataset

Before training or evaluation, please download the dataset and replace the paths with your own paths in /dataloaders/*

Training

To reproduce our results and train the model, run:

# For single-frame model training
python3 main_ft.py
# For multi-frame model training
python3 main_lora_multiframe.py

Make sure to set the correct paths in the python file before running.

Evaluating

To evaluate the model on datasets mentioned above, run:

# For both single frame and multi-frame evaluation
python3 eval_lora.py

Make sure to set the correct paths in the python file before running. If you are evaluating directly based on our pretrained models, you may need to set the path to the pretrained model in the python file as well.

Results

Our method achieves significantly higher recall than competing approaches, achieving new state-of-the-art performance on both single and multiple frame benchmarks.

Single-frame matching results

		MSLS Challenge		MSLS Val		NordLand		Pitts250k-test		SPED
Method	Latency (ms)	R@1	R@5	R@1	R@5	R@1	R@5	R@1	R@5	R@1	R@5
MixVPR	1.37	64.0	75.9	88.0	92.7	58.4	74.6	94.6	98.3	85.2	92.1
EigenPlaces	2.65	67.4	77.1	89.3	93.7	54.4	68.8	94.1	98.0	69.9	82.9
DINOv2 SALAD	2.41	73.0	86.8	91.2	95.3	69.6	84.4	94.5	98.7	89.5	94.4
UniPR-3D (ours)	8.23	74.3	87.5	91.4	96.0	76.2	87.3	94.9	98.1	89.6	94.5

Sequence matching results

	MSLS Val			NordLand			Oxford1			Oxford2
Method	R@1	R@5	R@10	R@1	R@5	R@10	R@1	R@5	R@10	R@1	R@5	R@10
SeqMatchNet	65.5	77.5	80.3	56.1	71.4	76.9	36.8	43.3	48.3	27.9	38.5	45.3
SeqVLAD	89.9	92.4	94.1	65.5	75.2	80.0	58.4	72.8	80.8	19.1	29.9	37.3
CaseVPR	91.2	94.1	95.0	84.1	89.9	92.2	90.5	95.2	96.5	72.8	85.8	89.9
UniPR-3D (ours)	93.7	95.7	96.9	86.8	91.7	93.8	95.4	98.1	98.7	80.6	90.3	93.9

📧 Contact

If you have any questions regarding this project, please contact Tianchen Deng (dengtianchen@sjtu.edu.cn). If you want to use our intermediate results for qualitative comparisons, please reach out to the same email.

✏️ Acknowledgement

Our implementation is heavily based on SALAD and VGGT. We thank the authors for their open-source contributions. If you use the code that is based on their contribution, please cite them as well.

🎓 Citation

If you find our paper and code useful, please cite us:

@inproceedings{deng2026_unipr3d,
  title     = {UniPR-3D: Towards Universal Visual Place Recognition with 3D Visual Geometry Grounded Transformer},
  author    = {Tianchen Deng and Xun Chen and Ziming Li and Hongming Shen and Danwei Wang and Javier Civera and Hesheng Wang},
  booktitle = {Arxiv},
  year      = {2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
dataloaders		dataloaders
datasets		datasets
training		training
utils		utils
vggt		vggt
.gitignore		.gitignore
DOCKERFILE		DOCKERFILE
README.md		README.md
eval_lora.py		eval_lora.py
main_ft.py		main_ft.py
main_lora.py		main_lora.py
main_lora_multiframe.py		main_lora_multiframe.py
vpr_vggt_ft.py		vpr_vggt_ft.py
vpr_vggt_lora.py		vpr_vggt_lora.py
vpr_vggt_lora_multiframe.py		vpr_vggt_lora_multiframe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniPR-3D: Towards Universal Visual Place Recognition with 3D Visual Geometry Grounded Transformer

📃 Description

🛠️ Setup

📦 Repository

💻 Installation

🚀 Usage

Downloading Pretrained Models

Downloading the Datasets

Training

Evaluating

Results

Single-frame matching results

Sequence matching results

📧 Contact

✏️ Acknowledgement

🎓 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UniPR-3D: Towards Universal Visual Place Recognition with 3D Visual Geometry Grounded Transformer

📃 Description

🛠️ Setup

📦 Repository

💻 Installation

🚀 Usage

Downloading Pretrained Models

Downloading the Datasets

Training

Evaluating

Results

Single-frame matching results

Sequence matching results

📧 Contact

✏️ Acknowledgement

🎓 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages