About VLMs

Hi!

We are really interested in your work! While trying to reproduce the VLM results, I noticed that LLaVA-v1.5 uses `openai/clip-vit-large-patch14-336` as its vision encoder, which is different from the LAION version.

Does this mean that, in order to reproduce LLaVA's results in Table 3, we also need to fine-tune `openai/clip-vit-large-patch14-336`? If so, could you please provide the fine-tuning script so that we can perfectly reproduce your results?

Meanwhile, could you please also upload the tar files for the other identities? For example, in Table 3: KANYE WEST, TOM CRUISE,  BARACK OBAMA, LADY GAGA, and BRUCE LEE. Additionally, in Table 6 (Appendix): BASKETBALL, BEACH, CASTLE, REVOLVER, RIFLE, SCHOOLBUS, and SUNGLASSES. That would be very helpful!

Thank you so much!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About VLMs #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About VLMs #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions