Due to the ordering of gradient descent / sums in the current implementation, we are evaluating losses for the entire dataset instead of minibatching. There might be other algorithms for variance that might be more ameanable to a minibatched estimation approach: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
Due to the ordering of gradient descent / sums in the current implementation, we are evaluating losses for the entire dataset instead of minibatching. There might be other algorithms for variance that might be more ameanable to a minibatched estimation approach: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance