Skip to content

Not Learning Image-Text Alignment #47

@Vishu26

Description

@Vishu26

Thank you for the great work!

I just ran a toy experiment on the Flickr-30k dataset (https://huggingface.co/datasets/nlphuji/flickr30k). I used all the default parameters and trained it using 4 H100s. However, after 120k steps (991 epochs), I noticed that the model is not able to learn image-text alignment. The generated images do not seem to align with the ground truth images. Below is an example - Top row represents ground truth images and bottom row represents generated images.

Image

I do not encounter this issue with other models such as DiT, SiT, REPA, etc. Any ideas on how to fix the issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions