Choosing the optimal hyperparameter 

Hi! Thanks for sharing the code of this exciting project. A quick question: when deciding the optimal hyperparameter, did you use the one with the lowest loss on the training set, validation set, or test set? It's a bit unclear since all three losses are monitored during training according to the code...
Thanks in advance, and looking forward to your reply!