|
tgt2, tgt2, tgt2, key_padding_mask=tgt_key_padding_mask.repeat(M, 1) |
The size of tgt2 dim 0, though equals to the size of tgt_key_padding_mask dim 0, the layouts of inner data are not the same.
tgt2: bs * M, first M elements are of the first sample,
tgt_key_padding_mask: bs * M, first M elements may comes from different samples,
should use repeat_interleave instead of repeat.