Skip to content

Latest commit

 

History

History

README.md

Gradient Descent for Linear Regression (Two Variables)

Overview

Gradient descent is an iterative optimization algorithm used to find the minimum of a function. In the context of linear regression, it minimizes the least squares cost function to find the best-fit line y = mx + b through a set of data points.

The algorithm works by:

  1. Starting with initial guesses for the slope (theta_1) and y-intercept (theta_0)
  2. Computing the gradient (partial derivatives) of the cost function with respect to each parameter
  3. Updating the parameters in the direction opposite to the gradient
  4. Repeating until convergence

The cost function (mean squared error) is:

J(theta_0, theta_1) = (1/2m) * sum((h(x_i) - y_i)^2)

where h(x) = theta_1 * x + theta_0 is the hypothesis (predicted value).

Visualizations

Sample 1: Small Dataset [1,2,3] vs [3,5,5]

Gradient Descent - Small Dataset

Sample 2: Larger Dataset

Gradient Descent - Larger Dataset

Each visualization shows:

  • Left panel: The data points (blue), regression line (red), and vertical error lines (gray) showing the residuals
  • Right panel: The cost function convergence over iterations (log scale), showing how the error decreases with each step

Original SageMath Output

The original Sage implementation produced these plots:

Original Sage output - small Original Sage output - large

Files

File Description
sage_code Original SageMath implementation
gradient_descent.py Python 3 translation with matplotlib
gradient_descent_small.png Visualization for small dataset
gradient_descent_large.png Visualization for larger dataset
sage0.png Original Sage output (small dataset)
sage1.png Original Sage output (larger dataset)

How to Run

python3 gradient_descent.py

You can also run the original SageMath code on SageMathCell.

Parameters

Parameter Default Description
alpha 0.01 Learning rate
max_iter 1000 Maximum iterations
min_tol 1e-4 Minimum parameter change tolerance
min_cost 1e-3 Minimum acceptable cost

Key Takeaways

  • The learning rate (alpha) controls how large each step is: too large causes divergence, too small causes slow convergence
  • The cost function decreases monotonically when the learning rate is properly chosen
  • The vertical error lines visualize the residuals -- the quantities that gradient descent is minimizing
  • This is the foundation of machine learning: fitting a model to data by minimizing a cost function