MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction [CVPRW2024]

Overview

MVDiff is a state-of-the-art diffusion model that generates consistent multi-view images from single-view inputs for high-quality 3D reconstruction.

Key Features

State-of-the-art Performance: Outperforms previous methods by 5-17% across metrics
Multiple Datasets: Support for ShapeNet, CO3D, and GSO datasets
Modular Design: Easy to extend and customize components
Comprehensive Evaluation: Full metric suite (PSNR, SSIM, LPIPS, FID)
3D Reconstruction: Multiple methods including MVS and NeRF-style

Performance

Method	PSNR ↑	SSIM ↑	LPIPS ↓	FID ↓	Runtime (ms)
PixelNeRF	23.17	0.800	0.165	58.3	450
SRN	22.89	0.780	0.178	62.1	380
Zero-1-to-3	25.43	0.840	0.142	45.7	320
One-2-3-45	26.91	0.860	0.128	38.2	280
MVDiff (Ours)	28.42	0.889	0.106	31.5	250

Installation

Prerequisites

Python 3.8+
CUDA 11.7+ (recommended)
16GB+ GPU memory (training)
8GB+ GPU memory (inference)

Quick Install

# Clone repository
git clone https://github.com/EmmanuelleB985/mvdiff.git
cd mvdiff

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run tests to verify installation
pytest tests/

Docker Installation

# Build Docker image
docker build -t mvdiff:latest .

# Run container with GPU support
docker run --gpus all -it -p 7860:7860 mvdiff:latest

Development Installation

# Install in development mode with extras
pip install -e ".[dev,test,docs]"

# Install pre-commit hooks
pre-commit install

# Run code quality checks
make lint
make test

Gradio Web Interface

Launch our interactive web demo with a beautiful UI:

python -m mvdiff.app

Python API

from mvdiff import MVDiffPipeline

# Initialize pipeline
pipeline = MVDiffPipeline.from_pretrained("mvdiff/shapenet-base")

# Generate multi-view images
views = pipeline(
    image="path/to/image.jpg",
    num_views=16,
    num_inference_steps=50,
    guidance_scale=3.0
)

# Reconstruct 3D
mesh = pipeline.reconstruct_3d(views, method="mvs")
mesh.export("output.obj")

Training

Data Preparation

First, download the datasets:

ShapeNet (requires registration at shapenet.org)

wget [shapenet_download_link]
unzip ShapeNetCore.v2.zip
unzip ShapeNetRendering.zip

CO3D (from Facebook Research)

wget https://dl.fbaipublicfiles.com/co3d/co3d_v2.zip
unzip co3d_v2.zip

GSO (Google Scanned Objects)

git clone https://github.com/googleinterns/gso-dataset.git
cd gso-dataset && bash download_gso.sh

Then, prepare the datasets for training:

ShapeNet Dataset

# Prepare ShapeNet
python scripts/prepare_shapenet.py \
    --shapenet_dir /path/to/ShapeNetCore.v2 \
    --rendering_dir /path/to/ShapeNetRendering \
    --output_dir data/shapenet_processed

CO3D Dataset

# Prepare CO3D
python scripts/prepare_co3d.py \
    --co3d_dir /path/to/co3d \
    --output_dir data/co3d_processed

Training Commands

Single GPU Training

python -m train \
    --config configs/train/shapenet_base.yaml \
    --data_dir data/shapenet_processed \
    --output_dir experiments/run_001

Multi-GPU Training (DDP)

torchrun --nproc_per_node=4 -m train \
    --config configs/train/shapenet_large.yaml \
    --distributed

Resume Training

python -m train \
    --config configs/train/shapenet_base.yaml \
    --resume_from checkpoints/last.ckpt

Training Configuration

# configs/train/shapenet_base.yaml
model:
  architecture: mvdiff_base
  img_size: 256
  num_diffusion_steps: 1000
  
training:
  batch_size: 16
  learning_rate: 1e-4
  num_epochs: 300
  gradient_clip: 1.0
  
  # Advanced options
  mixed_precision: true
  gradient_checkpointing: true
  ema_decay: 0.9999
  
optimizer:
  type: AdamW
  weight_decay: 0.01
  betas: [0.9, 0.999]
  
scheduler:
  type: cosine
  warmup_steps: 10000

Monitoring

# TensorBoard
tensorboard --logdir experiments/

# Weights & Biases
wandb login
# Set wandb.enabled: true in config

Evaluation

Benchmarking

# Full evaluation on test set
python -m evaluate \
    --checkpoint checkpoints/best_model.ckpt \
    --data_dir data/shapenet_processed \
    --split test \
    --metrics all

Custom Evaluation

from mvdiff.evaluation import Evaluator

evaluator = Evaluator(checkpoint="path/to/model.ckpt")
metrics = evaluator.evaluate(
    data_loader=test_loader,
    metrics=["psnr", "ssim", "lpips", "fid"],
    save_outputs=True
)

Advanced Usage

Custom Model Architecture

from mvdiff.models import MVDiffModel

class CustomMVDiff(MVDiffModel):
    def __init__(self, config):
        super().__init__(config)
        # Add custom layers
        self.custom_module = CustomAttention()
    
    def forward(self, x, timesteps, context):
        # Custom forward pass
        return super().forward(x, timesteps, context)

Research and Development

Experiment Tracking

We use Hydra + W&B for experiment management:

# Run hyperparameter sweep
python -m train --multirun \
    hydra/sweeper=optuna \
    hydra.sweeper.n_trials=50 \
    training.learning_rate=interval(1e-5,1e-3) \
    model.num_heads=choice(8,12,16)

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Fork and clone the repository
git clone https://github.com/EmmanuelleB985/mvdiff.git
cd mvdiff

# Create a branch
git checkout -b feature/new-feature

# Make changes and test
make test
make lint

# Submit PR

Code Style

We use:

Black for code formatting
isort for import sorting
flake8 for linting
mypy for type checking

# Auto-format code
make format

# Check code quality
make lint

Citation

If you use MVDiff in your research, please cite:

@article{Bourigault2024MVDiffSA,
  title={MVDiff: Scalable and Flexible Multi-view Diffusion for 3D Object Reconstruction from Single-View},
  author={Emmanuelle Bourigault and Pauline Bourigault},
  journal={2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2024},
  pages={7579-7586},
  url={https://api.semanticscholar.org/CorpusID:269614322}
}

Acknowledgments

We thank:

ShapeNet, CO3D, and GSO teams for providing datasets

License

This project is licensed under the MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
assets		assets
configs		configs
demo		demo
evaluation		evaluation
inference		inference
models		models
scripts		scripts
tests		tests
training		training
.flake8		.flake8
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction [CVPRW2024]

Overview

Key Features

Performance

Installation

Prerequisites

Quick Install

Docker Installation

Development Installation

Gradio Web Interface

Python API

Training

Data Preparation

ShapeNet (requires registration at shapenet.org)

CO3D (from Facebook Research)

GSO (Google Scanned Objects)

ShapeNet Dataset

CO3D Dataset

Training Commands

Single GPU Training

Multi-GPU Training (DDP)

Resume Training

Training Configuration

Monitoring

Evaluation

Benchmarking

Custom Evaluation

Advanced Usage

Custom Model Architecture

Research and Development

Experiment Tracking

Contributing

Development Setup

Code Style

Citation

Acknowledgments

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages