Diffusion models have achieved impressive results in generating high-quality images. Yet, they often struggle to faithfully align the generated images with the input prompts. This limitation arises from synchronous denoising, where all pixels simultaneously evolve from random noise to clear images. As a result, during generation, the prompt-related regions can only reference the unrelated regions at the same noise level, failing to obtain clear context and ultimately impairing text-to-image alignment. To address this issue, we propose asynchronous diffusion models, a novel framework that allocates distinct timesteps to different pixels and reformulates the pixel-wise denoising process. By dynamically modulating the timestep schedules of individual pixels, prompt-related regions are denoised more gradually than unrelated regions, thereby allowing them to leverage clearer inter-pixel context. Consequently, these prompt-related regions achieve better alignment in the final images. Extensive experiments demonstrate that our asynchronous diffusion models can significantly improve text-to-image alignment across diverse prompts.
You can start by running a quick test:
python3 asyn_sample.py --config.dev_id 0 --config.pretrained.model path/to/your/modelThis command uses the pretrained diffusion model (e.g., sd2.1-base) for sampling and saves the generated images under a subdirectory of ./data/ named with the current timestamp. The saved images follow the naming convention {image_idx}_{postfix}.png, where postfix can be one of base, base2, or tgt. These correspond to images generated by the vanilla diffusion model (DM), the diffusion model with a concave scheduler (DMconcave), and the asynchronous diffusion model (AsynDM), respectively.
The commonly used arguments is provided below. Please note that the default arguments are not meant to achieve best performance.
# config.exp_name: the storeage path under `./data` (default: current timestamp)
# config.prompt_file: path of the prompt file
# config.item_idx_file: path of the prompt index file
# config.static_mask: 0 for using dynamic mask, and 1 for using fixed mask
# config.curve_type: "bin"-quadratic scheduler, "lin"-piecewise linear scheduler, "exp"-exponential scheduler
# config.sample.num_steps: total timesteps
# config.sample.batch_size: batch size
# config.sample.num_batches_per_epoch: number of batches for each prompt
python3 asyn_sample.py \
--config.exp_name test_animal \
--config.dev_id 0 \
--config.pretrained.model path/to/your/model \
--config.seed 1234 \
--config.prompt_file config/prompt/animal.json \
--config.item_idx_file config/prompt/animal_item.json \
--config.static_mask 0 \
--config.curve_type bin \
--config.sample.num_steps 50 \
--config.sample.batch_size 8 \
--config.sample.num_batches_per_epoch 4