Tensor Parallel demo

Hi, I have run SDXL with `Tensor Parallel` as well as `sequence parallel`. Below is my PR, and may it help those who need it.

The Motivation:
Just trying to avoid using `grad checkpointing` to get higher throughput when inputs have higher resolution like 720p.

However, tensor parallel comes at a cost, and I have not gained throughput by TP. (Tested with 720*1080 on A100, batchsize=16 and amp).

Just in case someone have the same idea or try to run tensor prarallel with more blocks, below is my code changes:

[PR: support tensor parallel for sdxl](https://github.com/KimmiShi/diffusers-tp/pull/1)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor Parallel demo #4851

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tensor Parallel demo #4851

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions