Currently the following line in lightning.py must be executed.
# Load best checkpoint after training
checkpoint = torch.load(
self._callbacks["ModelCheckpoint"].best_model_path, weights_only=False
)
It is best practice to use the best model path. However, if there is a situation where the best checkpoint is not being created, this will cause the job to fail post training. The solution is adding a try-except statement, with a warning for when only latest is present.