This repository implements a Denoising Diffusion Probabilistic Model (DDPM) using a U-Net architecture, designed for image generation. The model is trained on the MNIST dataset and is implemented with PyTorch. It includes modular components for easier maintenance, scalability, and future enhancements. Essentially its a Diffusion Model From Scratch.
- Implements DDPM for image generation.
- Uses a U-Net-based denoising model for high-quality image reconstruction.
- Supports training and inference workflows.
- Modularized for scalability and customizability.
- CUDA-enabled for efficient training.
Ensure the following dependencies are installed before running the project:
pip install torch torchvision numpy einops tqdm timm matplotlibFor CUDA support, install the appropriate PyTorch version following the official PyTorch Installation Guide.
- Training the Model To train the model from scratch, run:
python train.pyTo resume training from a checkpoint, specify the checkpoint path:
python train.py --checkpoint_path checkpoints/ddpm_checkpoint- Running Inference To generate images using a pre-trained model, run:
python inference.py --checkpoint_path checkpoints/ddpm_checkpointThis will generate denoised images from random noise and display the intermediate steps.
- U-Net: Used as the backbone for denoising.
- Sinusoidal Time Embeddings: Encodes time steps for better model understanding.
- Residual Blocks & Attention Layers: Enhance feature extraction at multiple scales.
- Diffusion Scheduler: Handles noise addition and denoising step calculations.
- Sample an image from MNIST.
- Add noise using the diffusion process.
- Train the U-Net model to predict noise.
- Compute loss and update model parameters.
- Start with a random noise image.
- Perform denoising over T time steps.
- Generate a final clean image.