[CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
Created by Ziyi Wang*, Yanran Zhang*, Jie Zhou, Jiwen Lu (* indicates equal contribution)
This repository is an official implementation of UniPre3D (CVPR 2025).
Paper | arXiv | Project Page
UniPre3D is the first unified pre-training method for 3D point clouds that effectively handles both object- and scene-level data through cross-modal Gaussian splatting.
Our proposed pre-training task involves predicting Gaussian parameters from the input point cloud. The 3D backbone network is expected to extract representative features, and 3D Gaussian splatting is implemented to render images for direct supervision. To incorporate additional texture information and adjust task complexity, we introduce a pre-trained image model and propose a scale-adaptive fusion block to accommodate varying data scales.
- [2025-07-02] Our scene-level pretraining code is released.
- [2025-06-12] Our arXiv paper is released.
- [2025-06-11] Our object-level pretraining code is released.
- [2025-02-27] Our paper is accepted by CVPR 2025.
- Release datasets
- Release object-level pretraining code.
- Release object-level logs and checkpoints.
- Add more details about diverse downstream tasks.
- Release scene-level pretraining code.
- Release scene-level logs and checkpoints.
Below is visualization of UniPre3D pre-training outputs. The first row presents the input point clouds, followed by the reference view images in the second row. The third row displays the rendered images, which are supervised by the ground truth images shown in the fourth row. In the rightmost column, we illustrate a schematic diagram of the view selection principle for both object- and scene-level samples.
- [CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
- Getting Started π
- Python 3.11
- PyTorch 2.2
- CUDA 12.0 or higher
- Linux or Windows operating system
Please follow docs/INSTALLATION.md for detailed installation instructions.
- CUDA-capable GPU with compute capability 6.0 or higher
- Minimum 8GB GPU memory (16GB+ recommended for large-scale experiments)
- 16GB+ RAM
Please follow docs/DATA_PREPARATION.md for detailed data preparation instructions.
Object-level pre-training is a technique where we train a 3D model on a large collection of individual 3D objects before fine-tuning it for specific downstream tasks. This approach helps the model learn fundamental geometric patterns and structural representations that can be transferred to various 3D understanding tasks.
Key Characteristics:
- Focuses on learning from individual objects (e.g., chairs, airplanes, cars)
- Captures fine-grained local geometric structures
- Enables knowledge transfer to tasks like object classification and part segmentation
PointMLP pretraining:
CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name pointmlp_pretrainingStandard Transformer pretraining:
CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name transformer_pretrainingMamba3D pretraining:
CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name mamba3d_pretrainingPoint Cloud Mamba pretraining:
CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name pcm_pretrainingWe cache dataset images in memory to accelerate data loading. If you encounter memory constraints: Disable this feature by setting
opt.record_imgtofalseinconfigs/settings.yaml
We evaluate the effectiveness of UniPre3D on various object-level downstream tasks, including:
- Object Classification
- Part Segmentation
- Object Detection
We provide pretrained models and checkpoints for object-level tasks in the following table:
| Model | Pretrained Checkpoint | Downstream Task | Performance | Finetuning Logs |
|---|---|---|---|---|
| Standard Transformer |
Baidu Disk Google Drive |
Classification | 87.93% Acc (+10.69%) |
Logs |
| PointMLP |
Baidu Disk Google Drive |
Classification | 89.5% Acc (+2.1%) |
Logs |
| Point Cloud Mamba |
Baidu Disk Google Drive |
Classification | 89.0% Acc (+0.9%) |
Logs |
| Mamba3D |
Baidu Disk Google Drive |
Classification | 93.4% Acc (+0.8%) |
Logs |
| PointMLP |
Baidu Disk Google Drive |
Part Segmentation | 85.5% (+0.9%) |
Logs |
For more details on the usage of downstream tasks, please refer to the docs/OBJECT_LEVEL_DOWNSTREAM_TASKS.md file.
Scene-level pretraining focuses on learning representations from complex 3D environments containing multiple objects and spatial relationships. This approach helps models understand large-scale geometric structures and spatial contexts that are crucial for scene understanding tasks.
Key Characteristics:
- Processes complete indoor/outdoor scenes rather than individual objects
- Captures long-range spatial relationships and contextual information
- Optimized for tasks like semantic segmentation and instance segmentation
Sparse Unet pretraining:
CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name sparseunet_pretrainingPTv3 pretraining:
CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name ptv3_pretrainingWe cache dataset images in memory to accelerate data loading. If you encounter memory constraints: Disable this feature by setting
opt.record_imgtofalseinconfigs/settings.yaml
We evaluate the effectiveness of UniPre3D on various object-level downstream tasks, including:
- Semantic Segmentation
- Instance Segmentation
- 3D Object Detection
We provide pretrained models and checkpoints for scene-level tasks in the following table:
| Model | Pretrained Checkpoint | Downstream Task | Dataset | Performance | Finetuning Logs |
|---|---|---|---|---|---|
| Sparse Unet | Baidu Disk Google Drive | Semantic Segmentation | ScanNet20 | 75.8% mIoU (+3.6%) |
Logs |
| Sparse Unet | Baidu Disk Google Drive | Semantic Segmentation | ScanNet200 | 33.0% mIoU (+8.0%) |
Logs |
| Sparse Unet | Baidu Disk Google Drive | Semantic Segmentation | S3DIS | 71.5% mIoU (+6.1%) |
Logs |
| Sparse Unet | Baidu Disk Google Drive | Instance Segmentation | ScanNet20 | 75.9% mAP@25 (+1.2%) |
Logs |
| Sparse Unet | Baidu Disk Google Drive | Instance Segmentation | ScanNet200 | 37.1% mAP@25 (+2.8%) |
Logs |
| Point Transformer v3 | Baidu Disk Google Drive | Semantic Segmentation | ScanNet20 | 76.6% mIoU (+0.1%) |
Logs |
| Point Transformer v3 | Baidu Disk Google Drive | Semantic Segmentation | ScanNet200 | 36.0% mIoU (+0.8%) |
Logs |
For more details on the usage of downstream tasks, please refer to the docs/SCENE_LEVEL_DOWNSTREAM_TASKS.md file.
We would like to express our gratitude to
- Gaussian Splatting
- Openpoints
- Pointcept
- ShapenetRender_more_variation
- Splatter Image
- ShapeNet
- ScanNet
- PointCloudMamba
- Mamba3D
For any questions about data preparation, please feel free to open an issue in our repository or send email to 1302821779@qq.com
If you find this work useful in your research, please consider citing:
@inproceedings{wang2025unipre3d,
title={UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting},
author={Wang, Ziyi and Zhang, Yanran and Zhou, Jie and Lu, Jiwen},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={1319--1329},
year={2025}
}



