Qin Ren1, Yufei Wang2,6, Lanqing Guo3, Wen Zhang4, Zhiwen Fan5, Chenyu You1
1Stony Brook University 2Nanyang Technological University 3UT Austin
4Johns Hopkins University 5Texas A&M University 6SparcAI Research
Where should extra inference go? Typical TTS perturbs or resamples the whole image, even when only a small region is wrong. LoTTS uses quality-aware attention to find those weak regions and runs test-time scaling only there, leaving high-quality pixels fixed—training-free, and a much smaller search space.
Test-time scaling for diffusion models usually perturbs the entire image, yet quality is often uneven across the canvas.
Defects are typically localized: additional compute is better spent on weak regions than on restarting the whole sample.
LoTTS is training-free: attention-derived masks for localization, masked resampling with consistency controls.
- Localization. Contrast cross-/self-attention under quality prompts; form a coherent defect mask.
- Resampling. Noise injection and denoising inside the mask; brief global harmonization.
- Efficiency. Plug-and-play across backbones; ~2–4× fewer samples than Best-of-N at matched budgets.
Overview of LoTTS. Given a text prompt, LoTTS first generates candidate images from different noise seeds. It then localizes defective regions using high-/low-quality prompt contrast and constructs a quality-aware mask. Noise is injected only inside the masked regions, followed by localized denoising with spatial and temporal consistency. A verifier finally selects the best refined sample.
This project builds upon the following excellent open-source works:
- Diffusers — Hugging Face diffusion model library
- ImageReward — Human preference reward model
- HPSv2 — Human Preference Score v2
- ConceptAttention — Attention map extraction
- attention-map-diffusers — Attention map utilities
If you find this work useful, please consider citing:
@article{ren2025lotts,
title = {Scale Where It Matters: Training-Free Localized Scaling for Diffusion Models},
author = {Ren, Qin and Wang, Yufei and Guo, Lanqing and Zhang, Wen and Fan, Zhiwen and You, Chenyu},
journal = {arXiv preprint arXiv:2511.19917},
year = {2025}
}This project is released under the MIT License.