Hi ProteinNPT authors/maintainers,
Thank you very much for your work on ProteinNPT.
While reading the paper and looking through the code, I noticed that during training the model seems to use a fixed budget of 10,000 gradient steps for each assay and for each CV splitting scheme.
I am a bit puzzled by this choice: under fold_random_5, the position distribution in train/test is relatively uniform, so the model may be less prone to overfitting in the same way. However, for fold_modulo_5 and especially fold_contiguous_5, the train/test position distributions are more shifted. In that case, I wonder whether training for 10k steps could lead to overfitting to the training-position distribution (even if training loss keeps improving), and whether a much smaller number of steps (e.g., ~1k) might be optimal for these more challenging split schemes.
Thanks again for your time and for releasing this work.
Hi ProteinNPT authors/maintainers,
Thank you very much for your work on ProteinNPT.
While reading the paper and looking through the code, I noticed that during training the model seems to use a fixed budget of 10,000 gradient steps for each assay and for each CV splitting scheme.
I am a bit puzzled by this choice: under fold_random_5, the position distribution in train/test is relatively uniform, so the model may be less prone to overfitting in the same way. However, for fold_modulo_5 and especially fold_contiguous_5, the train/test position distributions are more shifted. In that case, I wonder whether training for 10k steps could lead to overfitting to the training-position distribution (even if training loss keeps improving), and whether a much smaller number of steps (e.g., ~1k) might be optimal for these more challenging split schemes.
Thanks again for your time and for releasing this work.