Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation

Introduction

Diffusion models have achieved impressive results in generating high-quality images. Yet, they often struggle to faithfully align the generated images with the input prompts. This limitation arises from synchronous denoising, where all pixels simultaneously evolve from random noise to clear images. As a result, during generation, the prompt-related regions can only reference the unrelated regions at the same noise level, failing to obtain clear context and ultimately impairing text-to-image alignment. To address this issue, we propose asynchronous diffusion models, a novel framework that allocates distinct timesteps to different pixels and reformulates the pixel-wise denoising process. By dynamically modulating the timestep schedules of individual pixels, prompt-related regions are denoised more gradually than unrelated regions, thereby allowing them to leverage clearer inter-pixel context. Consequently, these prompt-related regions achieve better alignment in the final images. Extensive experiments demonstrate that our asynchronous diffusion models can significantly improve text-to-image alignment across diverse prompts.

Run

You can start by running a quick test:

python3 asyn_sample.py --config.dev_id 0 --config.pretrained.model path/to/your/model

This command uses the pretrained diffusion model (e.g., sd2.1-base) for sampling and saves the generated images under a subdirectory of ./data/ named with the current timestamp. The saved images follow the naming convention {image_idx}_{postfix}.png, where postfix can be one of base, base2, or tgt. These correspond to images generated by the vanilla diffusion model (DM), the diffusion model with a concave scheduler (DM_concave), and the asynchronous diffusion model (AsynDM), respectively.

The commonly used arguments is provided below. Please note that the default arguments are not meant to achieve best performance.

# config.exp_name: the storeage path under `./data` (default: current timestamp)
# config.prompt_file: path of the prompt file
# config.item_idx_file: path of the prompt index file
# config.static_mask: 0 for using dynamic mask, and 1 for using fixed mask
# config.curve_type: "bin"-quadratic scheduler, "lin"-piecewise linear scheduler, "exp"-exponential scheduler
# config.sample.num_steps: total timesteps
# config.sample.batch_size: batch size
# config.sample.num_batches_per_epoch: number of batches for each prompt
python3 asyn_sample.py \
--config.exp_name test_animal \
--config.dev_id 0 \
--config.pretrained.model path/to/your/model \
--config.seed 1234 \
--config.prompt_file config/prompt/animal.json \
--config.item_idx_file config/prompt/animal_item.json \
--config.static_mask 0 \
--config.curve_type bin \
--config.sample.num_steps 50 \
--config.sample.batch_size 8 \
--config.sample.num_batches_per_epoch 4

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
config		config
diffusion		diffusion
log		log
model		model
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
asyn_sample.py		asyn_sample.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation

Introduction

Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation

Introduction

Run

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages