Great work by everyone! I'd like to ask you a little bit about how consistent self-atttention fits into the semantic motion predictor, I see that the input in the semantic motion predictor in the thesis is that there are only two images (one as the start frame and one as the end frame).
Great work by everyone! I'd like to ask you a little bit about how consistent self-atttention fits into the semantic motion predictor, I see that the input in the semantic motion predictor in the thesis is that there are only two images (one as the start frame and one as the end frame).