I found that training TGAT model (default parameter setting in config/TGAT.yml) would incur a non-stop increasing GPU memory consumption. But for the other models, such issue does not exist.
I assume it might be related with Python garbage collection mechanism, so I add torch.cuda.empty_cache() at the end of every training batch. Now such issue is solved, but I find the empty_cache operation is somewhat time-consuming.
I wonder if you notice the same issue and how it can be solved in a better way.