Dear Authors,
Thank you very much for the great work. I got a question and would appreciate your insights.
For the ProtT5 training, since it will predict the full sequence, not just the masked tokens. What is the loss function for the ProtT5 training, is it Torch crossentropyloss with reduction=SUM, or it is Torch crossentropyloss with reduction=MEAN?