Gradient descent is an iterative optimization algorithm used to find the minimum of a function. In the context of linear regression, it minimizes the least squares cost function to find the best-fit line y = mx + b through a set of data points.
The algorithm works by:
- Starting with initial guesses for the slope (theta_1) and y-intercept (theta_0)
- Computing the gradient (partial derivatives) of the cost function with respect to each parameter
- Updating the parameters in the direction opposite to the gradient
- Repeating until convergence
The cost function (mean squared error) is:
J(theta_0, theta_1) = (1/2m) * sum((h(x_i) - y_i)^2)
where h(x) = theta_1 * x + theta_0 is the hypothesis (predicted value).
Each visualization shows:
- Left panel: The data points (blue), regression line (red), and vertical error lines (gray) showing the residuals
- Right panel: The cost function convergence over iterations (log scale), showing how the error decreases with each step
The original Sage implementation produced these plots:
| File | Description |
|---|---|
sage_code |
Original SageMath implementation |
gradient_descent.py |
Python 3 translation with matplotlib |
gradient_descent_small.png |
Visualization for small dataset |
gradient_descent_large.png |
Visualization for larger dataset |
sage0.png |
Original Sage output (small dataset) |
sage1.png |
Original Sage output (larger dataset) |
python3 gradient_descent.pyYou can also run the original SageMath code on SageMathCell.
| Parameter | Default | Description |
|---|---|---|
alpha |
0.01 | Learning rate |
max_iter |
1000 | Maximum iterations |
min_tol |
1e-4 | Minimum parameter change tolerance |
min_cost |
1e-3 | Minimum acceptable cost |
- The learning rate (alpha) controls how large each step is: too large causes divergence, too small causes slow convergence
- The cost function decreases monotonically when the learning rate is properly chosen
- The vertical error lines visualize the residuals -- the quantities that gradient descent is minimizing
- This is the foundation of machine learning: fitting a model to data by minimizing a cost function



