GitHub - ranrhuang/SPFSplatV2: Official implementation of SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views

SPFSplatV2
Efficient Self-Supervised Pose-Free 3D Gaussian Splatting
from Sparse Views

Paper | Project Page

SPFSplatV2 efficiently leverages masked attention to predict target poses while simultaneously predicting 3D Gaussians from unposed sparse images, without requiring ground-truth poses during either training or inference.

🔧 Built on our ICCV version SPFSplat, this work introduces improved performance, higher training efficiency, and an extended design supporting VGGT. For more details, check out our paper! 📄✨

Table of Contents

Installation
Pre-trained Checkpoints
Camera Conventions
Datasets
Running the Code
Acknowledgements
Citation

Installation

Clone SPFSplat.

git clone git@github.com:ranrhuang/SPFSplatV2.git
cd SPFSplatV2

Create the environment, here we show an example using conda.

conda create -n spfsplatv2 python=3.11
conda activate spfpslatv2
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

Optional, compile the cuda kernels for RoPE (as in CroCo v2).

cd src/model/encoder/backbone/croco/curope/
python setup.py build_ext --inplace
cd ../../../../../..

Pre-trained Checkpoints

Our models are hosted on Hugging Face 🤗

Model name	Training resolutions	Training data	Training settings
re10k_spfsplatv2.ckpt	256x256	re10k	RE10K, 2 views, MASt3R-based
acid_spfsplatv2.ckpt	256x256	acid	ACID, 2 views, MASt3R-based
re10k_spfsplatv2l.ckpt	256x256	re10k	RE10K, 2 views, VGGT-based
acid_spfsplatv2l.ckpt	256x256	acid	ACID, 2 views, VGGT-based
re10k_spfsplatv2_10view.ckpt	256x256	re10k	RE10K, 10 views, MASt3R-based
re10k_spfsplatv2l_10view.ckpt	256x256	re10k	RE10K, 10 views, VGGT-based

We assume the downloaded weights are located in the pretrained_weights directory.

Datasets

Please refer to DATASETS.md for dataset preparation.

Running the Code

Training

If using MASt3R-based architecture, download the MASt3R pretrained model and put it in the ./pretrained_weights directory.
Train with:

# 2 view on SPFSplatV2 (MASt3R-based architecture)
python -m src.main +experiment=spfsplatv2/re10k wandb.mode=online wandb.name=re10k_spfsplatv2


# 2 view on SPFSplatV2-L (VGGT-based architecture)
python -m src.main +experiment=spfsplatv2-l/re10k wandb.mode=online wandb.name=re10k_spfsplatv2l


# multi view training on SPFSplatV2. We set train.random_drop_context_views=true. You can finetune from the 2-view checkpoint and use multiple GPUs to reduce the training time.
python -m src.main +experiment=spfsplatv2/re10k_10view wandb.mode=online wandb.name=re10k_spfsplatv2_10view

# multi view training on SPFSplatV2-L. We set train.random_drop_context_views=true. 
python -m src.main +experiment=spfsplatv2-l/re10k_10view wandb.mode=online wandb.name=re10k_spfsplatv2-l_10view

Evaluation

Novel View Synthesis and Pose Estimation on SPFSplatV2 (MASt3R-based architecture)

# RealEstate10K on MASt3R-based architecture(enable test.align_pose=true if using evaluation-time pose alignment)
python -m src.main +experiment=spfsplatv2/re10k mode=test wandb.name=re10k \
    dataset/view_sampler@dataset.re10k.view_sampler=evaluation \
    dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json \
    checkpointing.load=./pretrained_weights/re10k_spfsplatv2.ckpt \
    test.save_image=true test.align_pose=false

# ACID on MASt3R-based architecture(enable test.align_pose=true if using evaluation-time pose alignment)
python -m src.main +experiment=spfsplatv2/acid mode=test wandb.name=acid \
  dataset/view_sampler@dataset.re10k.view_sampler=evaluation \
  dataset.re10k.view_sampler.index_path=assets/evaluation_index_acid.json \
  checkpointing.load=./pretrained_weights/acid_spfsplatv2.ckpt \
  test.save_image=false test.align_pose=false


# 10 view
python -m src.main +experiment=spfsplatv2/re10k mode=test wandb.name=re10k_10view \
    dataset.re10k.view_sampler.num_context_views=10 \
    dataset/view_sampler@dataset.re10k.view_sampler=evaluation \
    dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json \
    checkpointing.load=./pretrained_weights/re10k_spfsplatv2_10view.ckpt \
    test.save_image=false test.align_pose=false

Novel View Synthesis and Pose Estimation on SPFSplatV2-L (VGGT-based architecture)

# RealEstate10K on VGGT-based architecture(enable test.align_pose=true if using evaluation-time pose alignment)
python -m src.main +experiment=spfsplatv2-l/re10k mode=test wandb.name=re10k \
    dataset/view_sampler@dataset.re10k.view_sampler=evaluation \
    dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json \
    checkpointing.load=./pretrained_weights/re10k_spfsplatv2l.ckpt \
    test.save_image=true test.align_pose=false

# ACID on VGGT-based architecture(enable test.align_pose=true if using evaluation-time pose alignment)
python -m src.main +experiment=spfsplatv2-l/acid mode=test wandb.name=acid \
  dataset/view_sampler@dataset.re10k.view_sampler=evaluation \
  dataset.re10k.view_sampler.index_path=assets/evaluation_index_acid.json \
  checkpointing.load=./pretrained_weights/acid_spfsplatv2l.ckpt \
  test.save_image=false test.align_pose=false

# 10 view
python -m src.main +experiment=spfsplatv2-l/re10k mode=test wandb.name=re10k_10view \
    dataset.re10k.view_sampler.num_context_views=10 \
    dataset/view_sampler@dataset.re10k.view_sampler=evaluation \
    dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json \
    checkpointing.load=./pretrained_weights/re10k_spfsplatv2l_10view.ckpt \
    test.save_image=false test.align_pose=false

Camera Conventions

We follow the pixelSplat camera system. The camera intrinsic matrices are normalized (the first row is divided by image width, and the second row is divided by image height). The camera extrinsic matrices are OpenCV-style camera-to-world matrices ( +X right, +Y down, +Z camera looks into the screen).

Acknowledgements

This project is built upon these excellent repositories: SPFSplat, NoPoSplat, pixelSplat, DUSt3R, and CroCo. We thank the original authors for their excellent work.

Citation

@article{huang2025spfsplatv2,
      title={SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views},
      author={Huang, Ranran and Mikolajczyk, Krystian},
      journal={arXiv preprint arXiv: 2509.17246},
      year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
config		config
examples		examples
src		src
.gitignore		.gitignore
DATASETS.md		DATASETS.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPFSplatV2
Efficient Self-Supervised Pose-Free 3D Gaussian Splatting
from Sparse Views

Paper | Project Page

Installation

Pre-trained Checkpoints

Datasets

Running the Code

Training

Evaluation

Novel View Synthesis and Pose Estimation on SPFSplatV2 (MASt3R-based architecture)

Novel View Synthesis and Pose Estimation on SPFSplatV2-L (VGGT-based architecture)

Camera Conventions

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SPFSplatV2 Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views

Paper | Project Page

Installation

Pre-trained Checkpoints

Datasets

Running the Code

Training

Evaluation

Novel View Synthesis and Pose Estimation on SPFSplatV2 (MASt3R-based architecture)

Novel View Synthesis and Pose Estimation on SPFSplatV2-L (VGGT-based architecture)

Camera Conventions

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

SPFSplatV2
Efficient Self-Supervised Pose-Free 3D Gaussian Splatting
from Sparse Views

Packages