-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Tensor Parallel demo #4851
Copy link
Copy link
Closed
Description
Hi, I have run SDXL with Tensor Parallel as well as sequence parallel. Below is my PR, and may it help those who need it.
The Motivation:
Just trying to avoid using grad checkpointing to get higher throughput when inputs have higher resolution like 720p.
However, tensor parallel comes at a cost, and I have not gained throughput by TP. (Tested with 720*1080 on A100, batchsize=16 and amp).
Just in case someone have the same idea or try to run tensor prarallel with more blocks, below is my code changes:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Fields
Give feedbackNo fields configured for issues without a type.