MVDiff is a state-of-the-art diffusion model that generates consistent multi-view images from single-view inputs for high-quality 3D reconstruction.
- State-of-the-art Performance: Outperforms previous methods by 5-17% across metrics
- Multiple Datasets: Support for ShapeNet, CO3D, and GSO datasets
- Modular Design: Easy to extend and customize components
- Comprehensive Evaluation: Full metric suite (PSNR, SSIM, LPIPS, FID)
- 3D Reconstruction: Multiple methods including MVS and NeRF-style
| Method | PSNR ↑ | SSIM ↑ | LPIPS ↓ | FID ↓ | Runtime (ms) |
|---|---|---|---|---|---|
| PixelNeRF | 23.17 | 0.800 | 0.165 | 58.3 | 450 |
| SRN | 22.89 | 0.780 | 0.178 | 62.1 | 380 |
| Zero-1-to-3 | 25.43 | 0.840 | 0.142 | 45.7 | 320 |
| One-2-3-45 | 26.91 | 0.860 | 0.128 | 38.2 | 280 |
| MVDiff (Ours) | 28.42 | 0.889 | 0.106 | 31.5 | 250 |
- Python 3.8+
- CUDA 11.7+ (recommended)
- 16GB+ GPU memory (training)
- 8GB+ GPU memory (inference)
# Clone repository
git clone https://github.com/EmmanuelleB985/mvdiff.git
cd mvdiff
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run tests to verify installation
pytest tests/# Build Docker image
docker build -t mvdiff:latest .
# Run container with GPU support
docker run --gpus all -it -p 7860:7860 mvdiff:latest# Install in development mode with extras
pip install -e ".[dev,test,docs]"
# Install pre-commit hooks
pre-commit install
# Run code quality checks
make lint
make testLaunch our interactive web demo with a beautiful UI:
python -m mvdiff.appfrom mvdiff import MVDiffPipeline
# Initialize pipeline
pipeline = MVDiffPipeline.from_pretrained("mvdiff/shapenet-base")
# Generate multi-view images
views = pipeline(
image="path/to/image.jpg",
num_views=16,
num_inference_steps=50,
guidance_scale=3.0
)
# Reconstruct 3D
mesh = pipeline.reconstruct_3d(views, method="mvs")
mesh.export("output.obj")First, download the datasets:
wget [shapenet_download_link]
unzip ShapeNetCore.v2.zip
unzip ShapeNetRendering.zipwget https://dl.fbaipublicfiles.com/co3d/co3d_v2.zip
unzip co3d_v2.zipgit clone https://github.com/googleinterns/gso-dataset.git
cd gso-dataset && bash download_gso.shThen, prepare the datasets for training:
# Prepare ShapeNet
python scripts/prepare_shapenet.py \
--shapenet_dir /path/to/ShapeNetCore.v2 \
--rendering_dir /path/to/ShapeNetRendering \
--output_dir data/shapenet_processed# Prepare CO3D
python scripts/prepare_co3d.py \
--co3d_dir /path/to/co3d \
--output_dir data/co3d_processedpython -m train \
--config configs/train/shapenet_base.yaml \
--data_dir data/shapenet_processed \
--output_dir experiments/run_001torchrun --nproc_per_node=4 -m train \
--config configs/train/shapenet_large.yaml \
--distributedpython -m train \
--config configs/train/shapenet_base.yaml \
--resume_from checkpoints/last.ckpt# configs/train/shapenet_base.yaml
model:
architecture: mvdiff_base
img_size: 256
num_diffusion_steps: 1000
training:
batch_size: 16
learning_rate: 1e-4
num_epochs: 300
gradient_clip: 1.0
# Advanced options
mixed_precision: true
gradient_checkpointing: true
ema_decay: 0.9999
optimizer:
type: AdamW
weight_decay: 0.01
betas: [0.9, 0.999]
scheduler:
type: cosine
warmup_steps: 10000# TensorBoard
tensorboard --logdir experiments/
# Weights & Biases
wandb login
# Set wandb.enabled: true in config# Full evaluation on test set
python -m evaluate \
--checkpoint checkpoints/best_model.ckpt \
--data_dir data/shapenet_processed \
--split test \
--metrics allfrom mvdiff.evaluation import Evaluator
evaluator = Evaluator(checkpoint="path/to/model.ckpt")
metrics = evaluator.evaluate(
data_loader=test_loader,
metrics=["psnr", "ssim", "lpips", "fid"],
save_outputs=True
)from mvdiff.models import MVDiffModel
class CustomMVDiff(MVDiffModel):
def __init__(self, config):
super().__init__(config)
# Add custom layers
self.custom_module = CustomAttention()
def forward(self, x, timesteps, context):
# Custom forward pass
return super().forward(x, timesteps, context)We use Hydra + W&B for experiment management:
# Run hyperparameter sweep
python -m train --multirun \
hydra/sweeper=optuna \
hydra.sweeper.n_trials=50 \
training.learning_rate=interval(1e-5,1e-3) \
model.num_heads=choice(8,12,16)We welcome contributions! Please see our Contributing Guide for details.
# Fork and clone the repository
git clone https://github.com/EmmanuelleB985/mvdiff.git
cd mvdiff
# Create a branch
git checkout -b feature/new-feature
# Make changes and test
make test
make lint
# Submit PRWe use:
- Black for code formatting
- isort for import sorting
- flake8 for linting
- mypy for type checking
# Auto-format code
make format
# Check code quality
make lintIf you use MVDiff in your research, please cite:
@article{Bourigault2024MVDiffSA,
title={MVDiff: Scalable and Flexible Multi-view Diffusion for 3D Object Reconstruction from Single-View},
author={Emmanuelle Bourigault and Pauline Bourigault},
journal={2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
year={2024},
pages={7579-7586},
url={https://api.semanticscholar.org/CorpusID:269614322}
}We thank:
- ShapeNet, CO3D, and GSO teams for providing datasets
This project is licensed under the MIT License - see LICENSE for details.
