Thanks for your contributions.
When I train model based on the setting: --max_seq_length 30 --max_seq_a_length 30 --max_img_seq_length 18, the error i get:
attention_scores= attention_scores + attention_mask
RuntimeError: The size of tensor a (471) must match the size of tensor b (48) at non-singleton dimension 3
I'm confused. I don't know what the problem is....
Could you please help me solve this problem? thank you