This project explores the bias-variance tradeoff using regularized linear and polynomial regression. It demonstrates how to diagnose underfitting (high bias) and overfitting (high variance) using learning curves and validation curves.
The regularized linear regression cost function is:
J(theta) = (1/2m) * sum((h(x) - y)^2) + (lambda/2m) * sum(theta_j^2)
Key concepts:
- Learning curves: Plot training error and cross-validation error vs. number of training examples
- Polynomial regression: Map features to higher-degree polynomials for non-linear fitting
- Lambda selection: Use validation curves to pick the optimal regularization parameter
| File | Description |
|---|---|
sample5.m |
Main script: bias-variance analysis |
linearRegCostFunction.m |
Regularized linear regression cost and gradient |
trainLinearReg.m |
Trains regularized linear regression |
learningCurve.m |
Computes learning curves |
polyFeatures.m |
Maps features to polynomial features |
plotFit.m |
Plots polynomial fit |
validationCurve.m |
Computes validation curves for lambda selection |
ex5data1.mat |
Dataset: water flow vs. water level change |
- Linear regression underfits the non-linear data (high bias)
- Polynomial regression (degree 8) with lambda=0 overfits (high variance)
- Validation curves show lambda=3 is the best regularization parameter
- Best validation error: ~3.82
Left: Underfitting with a linear model. Center: Good fit with polynomial degree 4. Right: Overfitting with polynomial degree 15.
Exercises from Andrew Ng's Machine Learning course on Coursera, completed by Keivan Hassani Monfared.
