Text-to-Image generation using Diffusion models

This project was developed as part of our Convolutional Neural Networks course and aims to construct a Text-to-Image model capable of generating images based on textual input.

Project Setup

To get started with the project, follow the instructions below to create a virtual environment and install the required dependencies. The dependencies are listed in pyproject.toml.

Using pip (via `venv`)

# Create a virtual environment
python -m venv venv

# Activate the environment
source venv/bin/activate

# Install dependencies
pip install .

Using uv

# Create and activate a virtual environment
uv venv

# Install dependencies from pyproject.toml
uv pip install -r pyproject.toml

Running the Scripts

You can run training or evaluation using the provided scripts inside the scripts/ directory:

python train.py

python eval.py

If you'd like to use a custom configuration file, you can specify the path using the --config flag:

python train.py --config configs/your_config.yaml

The configuration file should follow the same structure as the example configs provided in the configs/ directory.

Alternatively you can run inference to sample a image using:

python infer.py --model_path ../models/pretrained --prompt "A man in a suit and a tie."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text-to-Image generation using Diffusion models

Project Setup

Using pip (via `venv`)

Using uv

Running the Scripts

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Text-to-Image generation using Diffusion models

Project Setup

Using pip (via venv)

Using uv

Running the Scripts

Using pip (via `venv`)