In this repository, we explore different concepts from the current state of the art with respect to Diffusion models for controlled image generation. We train from scratch a custom made Unet to predict noise at a given timestep, and apply it to CelebA-HQ, CIFAR10 & MNIST datasets to generate samples from their corresponding distributions.
🚩 Disclaimer: All faces in this repository are generated (ie, fictional / for educational purposes).
In particular, we follow the paradigm by Ho et al., using
The Unets used for CelebA-HQ
All results are from EMA checkpoints. For caclulating EMA weights, I am using the ema_pytorch class from lucidrains repo, but it's also available via pip install ema-pytorch. All models were trained with mixed precision, using bfloat16.
Python >= 3.8PyTorchTorchvisionEinopsNumPyPandasMatplotlib- Other libraries:
PIL,PyYAML - Datasets used:
CelebA-HQ$$(1024^2\rightarrow256^2)$$ ,CIFAR10,MNIST
- Clone the repository:
git clone https://github.com/ntat/Class-Conditional-Diffusion.git
- Install dependencies via
pip:pip install -r requirements.txt
- Make sure you have the datasets and adjust
config.yaml - Run the script with
python:python main.py
- For inference, look into the
notebookssection to see how to interact with the code.
Each row corresponds to one of the classes in CIFAR10: airplane, bird, car, cat, deer, dog, frog, horse, ship, & truck.
- Conditioning Vector: [Wearing_Lipstick, Young, Attractive, No_Beard]
- Conditioning Vector: [Young, Attractive, Male, Smiling]
- Conditioning Vector: [Mouth_Slightly_Open, Wearing_Lipstick, Young, Smiling, Attractive, No_Beard]
- Conditioning Vector: [Bald, No_Beard, Male] - (very difficult, Bald distribution <2.5%)
- Conditioning Vector: [Eyeglasses, Attractive, Young] - (very difficult, Eyeglasses distribution <5%)








