Hi, first really thank you for your work!
Now I'm reproducing your work.
I have two question about clipping value and soft round function.
-
When I set learning rate to 10e-4, graident values are diverged. So I use gradient clipping and set the value to 0.01.
I wonder if it is reasonable value.
-
What value did you set to tuning parameter alpha?
thank you in advance!