-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi!
We are really interested in your work! While trying to reproduce the VLM results, I noticed that LLaVA-v1.5 uses openai/clip-vit-large-patch14-336 as its vision encoder, which is different from the LAION version.
Does this mean that, in order to reproduce LLaVA's results in Table 3, we also need to fine-tune openai/clip-vit-large-patch14-336? If so, could you please provide the fine-tuning script so that we can perfectly reproduce your results?
Meanwhile, could you please also upload the tar files for the other identities? For example, in Table 3: KANYE WEST, TOM CRUISE, BARACK OBAMA, LADY GAGA, and BRUCE LEE. Additionally, in Table 6 (Appendix): BASKETBALL, BEACH, CASTLE, REVOLVER, RIFLE, SCHOOLBUS, and SUNGLASSES. That would be very helpful!
Thank you so much!!