Dear author,
Thanks for sharing the code. I am greatly interested in your work. I have a question for you and would like your reply.
In the second stage, you adopt an encoder-decoder Transformer to reconstruct Tokens. Why not directly adopt the bidirectional Transformer in MaskGIT. Therefore, I want to know what are the advantages of the encoder-decoder Transformer.
Waiting for your reply!