Skip to content

Conversation

@TmacAaron
Copy link

@TmacAaron TmacAaron commented Nov 4, 2025

What does this PR do?

When memory is constrained, VAE tiling serves as a critical optimization to reduce memory footprint—here’s how it works at its core:

  1. Split the VAE’s original input (e.g., high-resolution images or latent tensors) into multiple smaller, independently processable tiles;
  2. Feed each tile into the VAE separately for encoding or decoding to generate tile-specific results;
  3. Stitch all processed tiles back to their original positional layout, with overlapping regions optimized via fusion algorithms to ensure the final output is seamless and natural.

However, the default VAE tiling workflow executes tile-wise operations sequentially - even when multiple devices are available. This leads to underutilization of hardware resources, as each tile’s VAE computation is inherently independent and can be parallelized.

To address this inefficiency, this PR introduces parallelized tiled VAE processing for AutoencoderKL and AutoencoderKLWan. By leveraging multi-device capabilities, we distribute the independent tile computations across available hardware, enabling simultaneous processing of multiple tiles. This not only maximizes resource utilization but also accelerates end-to-end VAE encoding/decoding throughput - all while retaining the memory-saving benefits of tiling for memory-constrained environments.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@TmacAaron TmacAaron changed the title implement vae dp for AutoencoderKL and AutoencoderKLWan [WIP] Support parallel tiled vae for AutoencoderKL and AutoencoderKLWan Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant