You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello.
I'm very interested in your excellent research.
However, the paper states that a Score-Aware Prediction Network with a two-layer MLP and a Sigmoid function is used,
but the code seems to use only sum of Text-relevant cross-attention score and Image-salient self-attention score for selecting.
If you could point me to this missing detail, I would greatly appreciate your kindness.