Skip to content

Germinari1/DiffuseIt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generating images with a diffusion model

What is even this?

This project is a simplified implementation of the Imagen text-to-image model. More specifically, this is an example of a diffusion model for image generation. Such a project is an efficient way of learning the difficult topic of modern generative AI by doing and exploring how a functional implementation might work.

Main ideas

When given an image caption, the Imagen text-to-image model generates an image that depicts the scene described - well, basically what you would expect from a generative model. This model employs a cascading diffusion model and utilizes a T5 text encoder to produce a caption encoding. This encoding conditions a base image generator, followed by a series of super-resolution models that refine the base image.

Other interesting aspects of this model worth noting are the concepts of noise conditioning augmentation and dynamic thresholding.

It is also important to note that this implementation is based on Phil Wang's implementation and was made possible by the large collection of learning materials - both theoretical and practical - available online, such as articles, tutorials, and blog posts.

If you want to read the original Imagen paper, you can find it here

Usage and Getting Started

First, clone this repository:

$ git clone [PUT REPO HERE]

After that, create a virtual environment:

$ pip install virtualenv
$ virtualenv venv

Then activate the virtual environment and install all dependencies:

$ .\venv\Scripts\activate.bat  # for Windows
$ source venv/bin/activate  # for MacOS/Linux
$ pip install -r requirements.txt

main.py

To use main.py for the most basic functionality, navigate to the project directory and run:

$ python main.py

This command will create a small "Imagen" instance, train it on a minimal dataset, and then generate an image using the trained instance.

After execution, two directories will be created:

  1. training_<TIMESTAMP>. This Training Directory is created during the training and includes:

    • A parameters subdirectory with configuration details.
    • state_dicts and tmp directories containing model checkpoints.
    • A training_progress.txt file that logs the training progress.
  2. generated_images_<TIMESTEP>, which contains:

    • A generated_images folder with the images generated by the model.
    • captions.txt files documenting the input captions, where each line index corresponds to an image number in the generated_images folder.
    • An imagen_training_directory.txt file specifying the Training Directory used to load the MinImagen instance and generate images.

train.py

main.py runs both train.py and inference.py in sequence, with the former training the model and the latter generating the image.

To train a model, execute train.py with the appropriate command line arguments. The arguments include:

  • --PARAMETERS or -p: The directory specifies the MinImagen configuration and is structured like a parameters subdirectory within a Training Directory.
  • --NUM_WORKERS or -n: Number of workers for the DataLoaders.
  • --BATCH_SIZE or -b: Batch size for training.
  • --MAX_NUM_WORDS or -mw: Maximum number of words allowed in a caption.
  • --IMG_SIDE_LEN or -s: Final side length of the square images output by MinImagen.
  • --EPOCHS or -e: Number of training epochs.
  • --T5_NAME or -t5: Name of the T5 encoder to use.
  • --TRAIN_VALID_FRAC or -f: Fraction of the dataset to use for training versus validation.
  • --TIMESTEPS or -t: Number of timesteps in the Diffusion Process.
  • --OPTIM_LR or -lr: Learning rate for the Adam optimizer.
  • --ACCUM_ITER or -ai: Number of batches to accumulate for gradient accumulation.
  • --CHCKPT_NUM or -cn: Interval of batches to create a temporary model checkpoint during training.
  • --VALID_NUM or -vn: Number of validation images to use.
  • --RESTART_DIRECTORY or -rd: Training directory to load the MinImagen instance from if resuming training.
  • --TESTING or -test: Used to run the script with a small MinImagen instance and a small dataset for testing.

For example:

python train.py --PARAMETERS ./parameters --BATCH_SIZE 2 --TIMESTEPS 25 --TESTING

inference.py

To generate images using a model from a training directory, use inference.py with the following command line arguments:

  • --TRAINING_DIRECTORY or -d: Specifies the training directory from which to load for inference.
  • --CAPTIONS or -c: Specifies either a single caption to generate an image for, or a filepath to a .txt file containing a list of captions, each on a new line.

For example:

python inference.py --CAPTIONS captions.txt --TRAINING_DIRECTORY training_<TIMESTAMP>

If you wish to create your own training and inference scripts, take a look at the files train.py and inference.py to get some inspiration.

About

An implementation from scratch of a diffusion model for image generation, inspired on Imagen. Includes model definition, training, and sampling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages