-
Notifications
You must be signed in to change notification settings - Fork 65
rmsprop final steps #26
Copy link
Copy link
Open
Description
I'm slightly confused about the final steps described in the doc vs the code below, should the Nesterov momentum be applied before updating the parameters, i.e.: self.wrt -= step1 + step2
step1 = step_m1 * self.momentum
self.wrt -= step1
gradient = self.fprime(self.wrt, *args, **kwargs)
self.moving_mean_squared = (
self.decay * self.moving_mean_squared
+ (1 - self.decay) * gradient ** 2)
step2 = self.step_rate * gradient
step2 /= sqrt(self.moving_mean_squared + 1e-8)
self.wrt -= step2
step = step1 + step2
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels