Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
.python-version		.python-version
CFGTrainer.py		CFGTrainer.py
README.md		README.md
common.py		common.py
dit.py		dit.py
experiment.ipynb		experiment.ipynb
gaussian_probability_path.py		gaussian_probability_path.py
mamba.py		mamba.py
mnist-generations-comparison.png		mnist-generations-comparison.png
mnist_sampler.py		mnist_sampler.py
pyproject.toml		pyproject.toml
simulator_utils.py		simulator_utils.py
unet.py		unet.py
uv.lock		uv.lock

Repository files navigation

MNIST Flow Matching with Different Architectures

Goal: compare different architectures on generating images from the MNIST dataset using flow matching

architectures to compare:

UNet model (from the course) - unet.py
Diffusion Transformer - dit.py
Mamba - mamba.py

Results

training run and example generations can be seen in experiment.ipynb

upon visual inspection, the mamba generated samples are noticably worse than the other models, they have more artifacts and the digits are less clear
DiT and UNet seem pretty comparable in generation quality
interesting to note that the DiT (16.8M params) has 14x more parameters than the UNet (1.2M params) but training speed is only 5% slower
interesting that UNet performance is very close to DiT despite using 14x less parameters

avg loss of final 500 steps:

UNet: 131.15
DiT: 123.94
Mamba: 154.31

the avg loss values are pretty consistent when aggregating over final 1000, 500, 100, and 50 steps

misc

code organization:

common.py contains shared abstract classes Sampleable and ConditionalVectorField
gaussian_probability_path.py contains code for GaussianConditionalProbabilityPath which is used to add noise to mnist images during training
simulator_utils.py contains the definitions for ODE (ordinary differential equation) and the ODE simulator
mnist_sampler.py contains the implementation of the MNIST dataloader / sampler
CFGTrainer.py contains code for the CFGTrainer class

much of the scaffolding/utility code was based on assignments from MIT's Introduction to Flow Matching and Diffusion Models course

credit to Peter E. Holderrieth and Ezra Erives

ideas for further experiments

try a harder dataset than MNIST to see if perf gap widens between DiT and UNet
try training the model to do ε-prediction (diffusion) rather than velocity prediction (flow matching)

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

Contributors

Languages