Skip to content

hu-zijing/AsynDM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation

arXiv  code 

Introduction

Diffusion models have achieved impressive results in generating high-quality images. Yet, they often struggle to faithfully align the generated images with the input prompts. This limitation arises from synchronous denoising, where all pixels simultaneously evolve from random noise to clear images. As a result, during generation, the prompt-related regions can only reference the unrelated regions at the same noise level, failing to obtain clear context and ultimately impairing text-to-image alignment. To address this issue, we propose asynchronous diffusion models, a novel framework that allocates distinct timesteps to different pixels and reformulates the pixel-wise denoising process. By dynamically modulating the timestep schedules of individual pixels, prompt-related regions are denoised more gradually than unrelated regions, thereby allowing them to leverage clearer inter-pixel context. Consequently, these prompt-related regions achieve better alignment in the final images. Extensive experiments demonstrate that our asynchronous diffusion models can significantly improve text-to-image alignment across diverse prompts.

Run

You can start by running a quick test:

python3 asyn_sample.py --config.dev_id 0 --config.pretrained.model path/to/your/model

This command uses the pretrained diffusion model (e.g., sd2.1-base) for sampling and saves the generated images under a subdirectory of ./data/ named with the current timestamp. The saved images follow the naming convention {image_idx}_{postfix}.png, where postfix can be one of base, base2, or tgt. These correspond to images generated by the vanilla diffusion model (DM), the diffusion model with a concave scheduler (DMconcave), and the asynchronous diffusion model (AsynDM), respectively.

The commonly used arguments is provided below. Please note that the default arguments are not meant to achieve best performance.

# config.exp_name: the storeage path under `./data` (default: current timestamp)
# config.prompt_file: path of the prompt file
# config.item_idx_file: path of the prompt index file
# config.static_mask: 0 for using dynamic mask, and 1 for using fixed mask
# config.curve_type: "bin"-quadratic scheduler, "lin"-piecewise linear scheduler, "exp"-exponential scheduler
# config.sample.num_steps: total timesteps
# config.sample.batch_size: batch size
# config.sample.num_batches_per_epoch: number of batches for each prompt
python3 asyn_sample.py \
--config.exp_name test_animal \
--config.dev_id 0 \
--config.pretrained.model path/to/your/model \
--config.seed 1234 \
--config.prompt_file config/prompt/animal.json \
--config.item_idx_file config/prompt/animal_item.json \
--config.static_mask 0 \
--config.curve_type bin \
--config.sample.num_steps 50 \
--config.sample.batch_size 8 \
--config.sample.num_batches_per_epoch 4

About

[ICLR 26] Asynchronous diffusion models allocate individual pixels with varying timestep schedules, yielding improved text-to-image alignment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages