Generating images with a diffusion model

What is even this?

This project is a simplified implementation of the Imagen text-to-image model. More specifically, this is an example of a diffusion model for image generation. Such a project is an efficient way of learning the difficult topic of modern generative AI by doing and exploring how a functional implementation might work.

Main ideas

When given an image caption, the Imagen text-to-image model generates an image that depicts the scene described - well, basically what you would expect from a generative model. This model employs a cascading diffusion model and utilizes a T5 text encoder to produce a caption encoding. This encoding conditions a base image generator, followed by a series of super-resolution models that refine the base image.

Other interesting aspects of this model worth noting are the concepts of noise conditioning augmentation and dynamic thresholding.

It is also important to note that this implementation is based on Phil Wang's implementation and was made possible by the large collection of learning materials - both theoretical and practical - available online, such as articles, tutorials, and blog posts.

If you want to read the original Imagen paper, you can find it here

Usage and Getting Started

First, clone this repository:

$ git clone [PUT REPO HERE]

After that, create a virtual environment:

$ pip install virtualenv
$ virtualenv venv

Then activate the virtual environment and install all dependencies:

$ .\venv\Scripts\activate.bat  # for Windows
$ source venv/bin/activate  # for MacOS/Linux
$ pip install -r requirements.txt

`main.py`

To use main.py for the most basic functionality, navigate to the project directory and run:

$ python main.py

This command will create a small "Imagen" instance, train it on a minimal dataset, and then generate an image using the trained instance.

After execution, two directories will be created:

training_<TIMESTAMP>. This Training Directory is created during the training and includes:
- A parameters subdirectory with configuration details.
- state_dicts and tmp directories containing model checkpoints.
- A training_progress.txt file that logs the training progress.
generated_images_<TIMESTEP>, which contains:
- A generated_images folder with the images generated by the model.
- captions.txt files documenting the input captions, where each line index corresponds to an image number in the generated_images folder.
- An imagen_training_directory.txt file specifying the Training Directory used to load the MinImagen instance and generate images.

`train.py`

main.py runs both train.py and inference.py in sequence, with the former training the model and the latter generating the image.

To train a model, execute train.py with the appropriate command line arguments. The arguments include:

--PARAMETERS or -p: The directory specifies the MinImagen configuration and is structured like a parameters subdirectory within a Training Directory.
--NUM_WORKERS or -n: Number of workers for the DataLoaders.
--BATCH_SIZE or -b: Batch size for training.
--MAX_NUM_WORDS or -mw: Maximum number of words allowed in a caption.
--IMG_SIDE_LEN or -s: Final side length of the square images output by MinImagen.
--EPOCHS or -e: Number of training epochs.
--T5_NAME or -t5: Name of the T5 encoder to use.
--TRAIN_VALID_FRAC or -f: Fraction of the dataset to use for training versus validation.
--TIMESTEPS or -t: Number of timesteps in the Diffusion Process.
--OPTIM_LR or -lr: Learning rate for the Adam optimizer.
--ACCUM_ITER or -ai: Number of batches to accumulate for gradient accumulation.
--CHCKPT_NUM or -cn: Interval of batches to create a temporary model checkpoint during training.
--VALID_NUM or -vn: Number of validation images to use.
--RESTART_DIRECTORY or -rd: Training directory to load the MinImagen instance from if resuming training.
--TESTING or -test: Used to run the script with a small MinImagen instance and a small dataset for testing.

For example:

python train.py --PARAMETERS ./parameters --BATCH_SIZE 2 --TIMESTEPS 25 --TESTING

`inference.py`

To generate images using a model from a training directory, use inference.py with the following command line arguments:

--TRAINING_DIRECTORY or -d: Specifies the training directory from which to load for inference.
--CAPTIONS or -c: Specifies either a single caption to generate an image for, or a filepath to a .txt file containing a list of captions, each on a new line.

For example:

python inference.py --CAPTIONS captions.txt --TRAINING_DIRECTORY training_<TIMESTAMP>

If you wish to create your own training and inference scripts, take a look at the files train.py and inference.py to get some inspiration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating images with a diffusion model

What is even this?

Main ideas

Usage and Getting Started

`main.py`

`train.py`

`inference.py`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
minimagen		minimagen
parameters		parameters
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
captions.txt		captions.txt
inference.py		inference.py
main.py		main.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Generating images with a diffusion model

What is even this?

Main ideas

Usage and Getting Started

main.py

train.py

inference.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`main.py`

`train.py`

`inference.py`

Packages