I just replace the sotfmax function with sparsemax function or tsallis15 function in my transformer model. It works well on training stage, but the following errors occur during the testing phase:
RuntimeError: CUDA error: device-side assert triggered
If I replace it with softmax function again, it works.
What could be the cause?