You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that you directly use OpenAI's GPT-4 to match caption and grounded entity. Why not train a custom model by leveraging existing datasets like the ones used in KOSMOS-2 or Shikra?