Learning curve comparison of KNN, Decision Tree, and Linear Regression regressors on an OpenML dataset.
This project compares regression models using learning curves computed on OpenML dataset 541. Categorical features are one-hot encoded before training. Task 1 plots learning curves for KNN regressors with k = 3, 5, and 7. Task 2 tunes KNN and Decision Tree regressors via GridSearchCV, then plots learning curves for the tuned KNN, tuned Decision Tree, and a baseline Linear Regression model. RMSE values at the largest training size are printed in a table for each task.
- Python 3.8+
- scikit-learn >= 1.0
- matplotlib >= 3.5
- numpy >= 1.21
pip install -r requirements.txtregression-learning-curves/
├── compare_regressors.py # Learning curve comparison script
├── requirements.txt # Python dependencies
├── .gitignore
└── README.md
python compare_regressors.pySteps performed:
- Downloads OpenML dataset 541 and one-hot encodes categorical columns.
- Task 1: plots learning curves for KNN with k = 3, 5, 7; saves
task_1_knn_with_different_k_values.png. - Task 2: runs GridSearchCV for KNN and Decision Tree, then plots learning curves for the tuned models and Linear Regression; saves
task_2_tuned_models_and_linear_regression.png. - Prints an RMSE summary table for each task.
Two PNG plots and two RMSE tables are produced:
| File | Description |
|---|---|
task_1_knn_with_different_k_values.png |
KNN learning curves for k = 3, 5, 7 |
task_2_tuned_models_and_linear_regression.png |
Tuned KNN, tuned DT, and LR learning curves |
RMSE values at the maximum training set size are printed to stdout beneath each plot.
Biswajeet Sahoo
MIT License