Skip to content

About VLMs #5

@YUECHE77

Description

@YUECHE77

Hi!

We are really interested in your work! While trying to reproduce the VLM results, I noticed that LLaVA-v1.5 uses openai/clip-vit-large-patch14-336 as its vision encoder, which is different from the LAION version.

Does this mean that, in order to reproduce LLaVA's results in Table 3, we also need to fine-tune openai/clip-vit-large-patch14-336? If so, could you please provide the fine-tuning script so that we can perfectly reproduce your results?

Meanwhile, could you please also upload the tar files for the other identities? For example, in Table 3: KANYE WEST, TOM CRUISE, BARACK OBAMA, LADY GAGA, and BRUCE LEE. Additionally, in Table 6 (Appendix): BASKETBALL, BEACH, CASTLE, REVOLVER, RIFLE, SCHOOLBUS, and SUNGLASSES. That would be very helpful!

Thank you so much!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions