Question about the choice of backbone

Hi,

This is a very interesting and impressive paper. After reading this paper multiple times, there is still one question that stuck in my mind. U-Net has been adapted for many diffusion tasks. Is there any reason that this work chose to use a transformer instead of a U-Net? If we use the U-Net as the backbone, does using x-prediction plus v-loss also improve performance? Or, does U-Net need another combination of prediction and loss?

Any insight into this is really helpful

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the choice of backbone #52

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about the choice of backbone #52

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions