Skip to content

Latest commit

 

History

History

README.md

Bias-Variance Tradeoff

Overview

This project explores the bias-variance tradeoff using regularized linear and polynomial regression. It demonstrates how to diagnose underfitting (high bias) and overfitting (high variance) using learning curves and validation curves.

Algorithm

The regularized linear regression cost function is:

J(theta) = (1/2m) * sum((h(x) - y)^2) + (lambda/2m) * sum(theta_j^2)

Key concepts:

  • Learning curves: Plot training error and cross-validation error vs. number of training examples
  • Polynomial regression: Map features to higher-degree polynomials for non-linear fitting
  • Lambda selection: Use validation curves to pick the optimal regularization parameter

Files

File Description
sample5.m Main script: bias-variance analysis
linearRegCostFunction.m Regularized linear regression cost and gradient
trainLinearReg.m Trains regularized linear regression
learningCurve.m Computes learning curves
polyFeatures.m Maps features to polynomial features
plotFit.m Plots polynomial fit
validationCurve.m Computes validation curves for lambda selection
ex5data1.mat Dataset: water flow vs. water level change

Key Results

  • Linear regression underfits the non-linear data (high bias)
  • Polynomial regression (degree 8) with lambda=0 overfits (high variance)
  • Validation curves show lambda=3 is the best regularization parameter
  • Best validation error: ~3.82

Visualization

Bias-Variance Visualization

Left: Underfitting with a linear model. Center: Good fit with polynomial degree 4. Right: Overfitting with polynomial degree 15.

Credit

Exercises from Andrew Ng's Machine Learning course on Coursera, completed by Keivan Hassani Monfared.