[CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

Created by Ziyi Wang*, Yanran Zhang*, Jie Zhou, Jiwen Lu (* indicates equal contribution)

This repository is an official implementation of UniPre3D (CVPR 2025).

Paper | arXiv | Project Page

UniPre3D is the first unified pre-training method for 3D point clouds that effectively handles both object- and scene-level data through cross-modal Gaussian splatting.

Our proposed pre-training task involves predicting Gaussian parameters from the input point cloud. The 3D backbone network is expected to extract representative features, and 3D Gaussian splatting is implemented to render images for direct supervision. To incorporate additional texture information and adjust task complexity, we introduce a pre-trained image model and propose a scale-adaptive fusion block to accommodate varying data scales.

News 🔥

[2025-07-02] Our scene-level pretraining code is released.
[2025-06-12] Our arXiv paper is released.
[2025-06-11] Our object-level pretraining code is released.
[2025-02-27] Our paper is accepted by CVPR 2025.

TODO (In Progress) ⭐

Release datasets
Release object-level pretraining code.
Release object-level logs and checkpoints.
Add more details about diverse downstream tasks.
Release scene-level pretraining code.
Release scene-level logs and checkpoints.

Visualization Results 📷

Below is visualization of UniPre3D pre-training outputs. The first row presents the input point clouds, followed by the reference view images in the second row. The third row displays the rendered images, which are supervised by the ground truth images shown in the fourth row. In the rightmost column, we illustrate a schematic diagram of the view selection principle for both object- and scene-level samples.

Getting Started 🚀

Table of Contents 📖

[CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
Getting Started 🚀

Environment Setup 🔧

Recommended Environment

Python 3.11
PyTorch 2.2
CUDA 12.0 or higher
Linux or Windows operating system

Please follow docs/INSTALLATION.md for detailed installation instructions.

Hardware Requirements

CUDA-capable GPU with compute capability 6.0 or higher
Minimum 8GB GPU memory (16GB+ recommended for large-scale experiments)
16GB+ RAM

Data Preparation

Please follow docs/DATA_PREPARATION.md for detailed data preparation instructions.

Object-level Pre-training 🪑

Object-level pre-training is a technique where we train a 3D model on a large collection of individual 3D objects before fine-tuning it for specific downstream tasks. This approach helps the model learn fundamental geometric patterns and structural representations that can be transferred to various 3D understanding tasks.

Key Characteristics:

Focuses on learning from individual objects (e.g., chairs, airplanes, cars)
Captures fine-grained local geometric structures
Enables knowledge transfer to tasks like object classification and part segmentation

Usage

PointMLP pretraining:

CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name pointmlp_pretraining

Standard Transformer pretraining:

CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name transformer_pretraining

Mamba3D pretraining:

CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name mamba3d_pretraining

Point Cloud Mamba pretraining:

CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name pcm_pretraining

We cache dataset images in memory to accelerate data loading. If you encounter memory constraints: Disable this feature by setting opt.record_img to false in configs/settings.yaml

Finetune on Object-level Downstream Tasks 🎯

We evaluate the effectiveness of UniPre3D on various object-level downstream tasks, including:

Object Classification
Part Segmentation
Object Detection

Model Zoo (Pretrained Checkpoints)

We provide pretrained models and checkpoints for object-level tasks in the following table:

Model	Pretrained Checkpoint	Downstream Task	Performance	Finetuning Logs
Standard Transformer	Baidu Disk Google Drive	Classification	87.93% Acc (+10.69%)	Logs
PointMLP	Baidu Disk Google Drive	Classification	89.5% Acc (+2.1%)	Logs
Point Cloud Mamba	Baidu Disk Google Drive	Classification	89.0% Acc (+0.9%)	Logs
Mamba3D	Baidu Disk Google Drive	Classification	93.4% Acc (+0.8%)	Logs
PointMLP	Baidu Disk Google Drive	Part Segmentation	85.5% $\text{mIoU}_C$ (+0.9%)	Logs

For more details on the usage of downstream tasks, please refer to the docs/OBJECT_LEVEL_DOWNSTREAM_TASKS.md file.

Scene-level Pretraining 🏠

Scene-level pretraining focuses on learning representations from complex 3D environments containing multiple objects and spatial relationships. This approach helps models understand large-scale geometric structures and spatial contexts that are crucial for scene understanding tasks.

Key Characteristics:

Processes complete indoor/outdoor scenes rather than individual objects
Captures long-range spatial relationships and contextual information
Optimized for tasks like semantic segmentation and instance segmentation

Usage

Sparse Unet pretraining:

CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name sparseunet_pretraining

PTv3 pretraining:

CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name ptv3_pretraining

We cache dataset images in memory to accelerate data loading. If you encounter memory constraints: Disable this feature by setting opt.record_img to false in configs/settings.yaml

Finetune on Scene-level Downstream Tasks 🎯

We evaluate the effectiveness of UniPre3D on various object-level downstream tasks, including:

Semantic Segmentation
Instance Segmentation
3D Object Detection

Model Zoo (Pretrained Checkpoints)

We provide pretrained models and checkpoints for scene-level tasks in the following table:

Model	Pretrained Checkpoint	Downstream Task	Dataset	Performance	Finetuning Logs
Sparse Unet	Baidu Disk Google Drive	Semantic Segmentation	ScanNet20	75.8% mIoU (+3.6%)	Logs
Sparse Unet	Baidu Disk Google Drive	Semantic Segmentation	ScanNet200	33.0% mIoU (+8.0%)	Logs
Sparse Unet	Baidu Disk Google Drive	Semantic Segmentation	S3DIS	71.5% mIoU (+6.1%)	Logs
Sparse Unet	Baidu Disk Google Drive	Instance Segmentation	ScanNet20	75.9% mAP@25 (+1.2%)	Logs
Sparse Unet	Baidu Disk Google Drive	Instance Segmentation	ScanNet200	37.1% mAP@25 (+2.8%)	Logs
Point Transformer v3	Baidu Disk Google Drive	Semantic Segmentation	ScanNet20	76.6% mIoU (+0.1%)	Logs
Point Transformer v3	Baidu Disk Google Drive	Semantic Segmentation	ScanNet200	36.0% mIoU (+0.8%)	Logs

For more details on the usage of downstream tasks, please refer to the docs/SCENE_LEVEL_DOWNSTREAM_TASKS.md file.

Acknowledgements 🙏

We would like to express our gratitude to

For any questions about data preparation, please feel free to open an issue in our repository or send email to 1302821779@qq.com

Citation 📚

If you find this work useful in your research, please consider citing:

@inproceedings{wang2025unipre3d,
  title={UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting},
  author={Wang, Ziyi and Zhang, Yanran and Zhou, Jie and Lu, Jiwen},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={1319--1329},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
assets		assets
configs		configs
dataset		dataset
docs		docs
fusion		fusion
gaussian_renderer		gaussian_renderer
model		model
openpoints		openpoints
pointcept		pointcept
utils		utils
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
logger.py		logger.py
requirements.txt		requirements.txt
train_network.py		train_network.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

News 🔥

TODO (In Progress) ⭐

Visualization Results 📷

Getting Started 🚀

Table of Contents 📖

Environment Setup 🔧

Recommended Environment

Hardware Requirements

Data Preparation

Object-level Pre-training 🪑

Usage

Finetune on Object-level Downstream Tasks 🎯

Model Zoo (Pretrained Checkpoints)

Scene-level Pretraining 🏠

Usage

Finetune on Scene-level Downstream Tasks 🎯

Model Zoo (Pretrained Checkpoints)

Acknowledgements 🙏

Citation 📚

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

wangzy22/UniPre3D

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

News 🔥

TODO (In Progress) ⭐

Visualization Results 📷

Getting Started 🚀

Table of Contents 📖

Environment Setup 🔧

Recommended Environment

Hardware Requirements

Data Preparation

Object-level Pre-training 🪑

Usage

Finetune on Object-level Downstream Tasks 🎯

Model Zoo (Pretrained Checkpoints)

Scene-level Pretraining 🏠

Usage

Finetune on Scene-level Downstream Tasks 🎯

Model Zoo (Pretrained Checkpoints)

Acknowledgements 🙏

Citation 📚

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages