Skip to content

Comments

Fix some minor issues in notebook of lecture 4.#41

Open
anusser wants to merge 1 commit intoee227c:masterfrom
anusser:master
Open

Fix some minor issues in notebook of lecture 4.#41
anusser wants to merge 1 commit intoee227c:masterfrom
anusser:master

Conversation

@anusser
Copy link

@anusser anusser commented Apr 9, 2021

Thanks for creating such nice lecture material and making it publicly available.

While using the notebook of lecture 4, I found some (what I believe are) minor issues, which I fixed. As diffs of notebooks are unfortunately quite messy, I cleaned it up and I list the important changes in the diff below. I guess the only change that needs explanation is the last one. For tuning the gradient descent step size using gradient descent, I use the average point (i.e., sum(xs)/len(xs)) instead of the last one (i.e., xs[-1]). This gives a better result I believe.

If you have questions regarding other changes, feel free to ask. :)

## modified /cells/11/source:
-  x1.all() == 0
+  np.all(x1 == 0)


## modified /cells/16/source:
@@ -1,4 +1,4 @@
-x0 = np.random.normal(0, 1, (1000))
+x0 = proj(np.random.normal(0, 1, (1000)))
 xs = gradient_descent(x0, [0.1]*50, quadratic_gradient, proj)
 # the optimal solution is the projection of the origin
 x_opt = proj(0)


## modified /cells/19/source:
@@ -8,8 +8,8 @@ $$f(x) = \frac 1{2m}\sum_{i=1}^m (a_i^\top x - b_j)^2
 =\frac1{2m}\|Ax-b\|^2$$
 </p>
 
-We can verify that $\nabla f(x) = A^\top(Ax-b)$ and
-$\nabla^2 f(x) = A^\top A.$
+We can verify that $\nabla f(x) = \frac{1}{m}A^\top(Ax-b)$ and
+$\nabla^2 f(x) = \frac{1}{m} A^\top A.$
 
 Hence, the objective is $\beta$-smooth with 
-$\beta=\lambda_{\mathrm{max}}(A^\top A)$, and $\alpha$-strongly convex with $\alpha=\lambda_{\mathrm{min}}(A^\top A)$.

+$\beta=\lambda_{\mathrm{max}}(\frac{1}{m}A^\top A)$, and $\alpha$-strongly convex with $\alpha=\lambda_{\mathrm{min}}(\frac{1}{m}A^\top A)$.


## modified /cells/27/source:
@@ -1,3 +1,3 @@
 ### Underdetermined case $m < n$
 
-In the underdetermined case, the least squares objective is inevitably not strongly convex, since $A^\top A$ is a rank deficient matrix and hence $\lambda_{\mathrm{min}}(A^\top A)=0.$

+In the underdetermined case, the least squares objective is inevitably not strongly convex, since $\frac{1}{m}A^\top A$ is a rank deficient matrix and hence $\lambda_{\mathrm{min}}(\frac{1}{m}A^\top A)=0.$


## modified /cells/35/source:
-  Note that we can find the optimal solution to the optimization problem in closed form without even running gradient descent by computing $x_{\mathrm{opt}}=(A^\top+\alpha I)^{-1}A^\top b.$ Please verify that this point is indeed optimal.
+  Note that we can find the optimal solution to the optimization problem in closed form without even running gradient descent by computing $x_{\mathrm{opt}}=(\frac{1}{m}A^\top+\alpha I)^{-1}\frac{1}{m}A^\top b.$ Please verify that this point is indeed optimal.


## modified /cells/36/source:
-  x_opt = np.linalg.inv(A.T.dot(A) + 0.1*np.eye(1000)).dot(A.T).dot(b)
+  x_opt = np.linalg.inv(A.T.dot(A)/m + 0.1*np.eye(1000)).dot(A.T).dot(b)/m


## modified /cells/44/source:
@@ -7,4 +7,4 @@ LASSO is the name for $\ell_1$-regularized least squares regression:
 $$\frac1{2m}\|Ax-b\|^2 + \alpha\|x\|_1$$
 </p>
 
-We will see that LASSO is able to fine *sparse* solutions if they exist. This is a common motivation for using an $\ell_1$-regularizer.

+We will see that LASSO is able to find *sparse* solutions if they exist. This is a common motivation for using an $\ell_1$-regularizer.


## modified /cells/72/source:
@@ -4,4 +4,4 @@ def f(x):
 def optimizer(steps):
     """Optimize a quadratic with the given steps."""
     xs = gradient_descent(x0, steps, grad(f))
-    return f(xs[-1])

+    return f(sum(xs)/len(xs))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant