Student Performance Analytics

End-to-end exploration and modeling of the UCI student performance datasets (Mathematics and Portuguese). The project delivers reproducible preprocessing, side-by-side EDA on raw and engineered features, supervised models for pass/fail and grade prediction, unsupervised clustering, and auto-saved visuals and summaries under reports/.

Repository Overview

data/raw/: Original UCI CSVs (student-mat.csv, student-por.csv) plus metadata archives (student+performance.zip, student.txt).
data/processed/: Precomputed feature matrices (processed_mat.csv, processed_por.csv) generated by the preprocessing pipeline.
src/preprocess_data.py: End-to-end preprocessing script (one-hot for binary/categorical columns, standard scaling for numerics, aligned feature set across both subjects).
notebooks/: EDA notebooks:
- Data_Review.ipynb: IQR outlier review on raw math/Portuguese datasets with boxplots.
- Analyze_Raw.ipynb: Raw-data distributions, full correlation heatmaps, G3 correlation tables, and KDE comparisons of key fields.
- Analyze_Processed.ipynb: Same diagnostics on the processed feature matrices.
Models/: Modeling notebooks (intended for notebook execution; Classification/Regression also include if __name__ == "__main__": guards if converted to scripts):
- Classification.ipynb: Pass/fail classifiers (Logistic Regression, Random Forest, optional XGBoost), test + 5-fold CV metrics, confusion/ROC/PR plots.
- Regression.ipynb: G3 regression with and without G1/G2, Linear/RandomForest/optional XGBoost, MAE/RMSE/R2 and CV RMSE.
- Kmeans.ipynb: K-means (K=2-8) with silhouette selection, PCA scatter, cluster summaries, and pass-rate per cluster.
scripts/plot_utils.py: Utility to save all open Matplotlib figures to reports/Datasets/<dir>/ (used by notebooks).
scripts/student_merge.R: R helper to merge math/Portuguese records on shared demographics.
reports/: Generated assets:
- Datasets/Raw and Datasets/Processed: EDA figures (distributions, correlation heatmaps, G3 correlation bars, KDE comparisons, outlier boxplots).
- Models/Classification, Models/Regression, Models/KMeans: Model plots plus CSV summaries.

Data & Preprocessing

Raw data lives in data/raw/ (student-mat.csv, student-por.csv, semicolon-separated). Keep filenames unchanged.
Generate processed feature matrices (binary/categorical one-hot encoded; numeric scaled) aligned across subjects:
```
python -m src.preprocess_data
```
Outputs: data/processed/processed_mat.csv, data/processed/processed_por.csv.
Processed files are already checked in for convenience; regenerate them if you update the raw data.

Exploratory Notebooks

notebooks/Data_Review.ipynb: Basic schema checks and Tukey outlier inspection on raw datasets; saves boxplots to reports/Datasets/Raw/.
notebooks/Analyze_Raw.ipynb: Distribution grids, correlation heatmaps, G3 correlation table, and KDE comparisons for key features on raw data; saves to reports/Datasets/Raw/.
notebooks/Analyze_Processed.ipynb: Mirrors the above analyses on the engineered feature matrices; saves to reports/Datasets/Processed/.

Plot Capture Utility

scripts.plot_utils.save_all_figs(title: str, dir: str) saves every open Matplotlib figure to reports/Datasets/<dir>/ with slugged filenames. Example inside notebooks:

from scripts.plot_utils import save_all_figs
save_all_figs("Correlation Heatmap - student-mat", "Raw")       # Raw analyses
save_all_figs("Correlation Heatmap - student-mat", "Processed") # Processed analyses

Ensure the target subdirectory (Raw or Processed) exists under reports/Datasets/.

Modeling Notebooks

Models/Classification.ipynb
- Loads processed features + raw labels, builds binary target (G3 >= 10).
- Trains Logistic Regression, Random Forest, and XGBoost (if installed); computes test Accuracy/Precision/Recall/F1/AUC and 5-fold CV accuracy.
- Saves per-model confusion, ROC, and PR curves plus reports/Models/Classification/classification_results_summary.csv.
Models/Regression.ipynb
- Predicts G3 with and without G1/G2 features using Linear Regression, RandomForestRegressor, and optional XGBRegressor.
- Reports MAE, RMSE, R2, and CV RMSE; writes residual and predicted-vs-true plots.
- Summary CSV: reports/Models/Regression/regression_results_summary.csv.
Models/Kmeans.ipynb
- Clusters processed features (excluding G3) for mat and por.
- Selects K via silhouette over K=2-8, saves PCA scatter, cluster heatmap, and CSVs for feature means and pass rates.

Environment & Setup

Python 3.10+ recommended.
Core Python deps: pandas, numpy, scikit-learn, matplotlib, seaborn; optional xgboost for boosted models.
Jupyter (or VS Code notebooks) to run .ipynb files; R (optional) for scripts/student_merge.R.

python3 -m venv .venv
source .venv/bin/activate
pip install pandas numpy scikit-learn matplotlib seaborn  # + xgboost if available

Quickstart

# 1) Preprocess data (creates data/processed/*)
python -m src.preprocess_data

# 2) Run EDA notebooks
jupyter notebook notebooks/  # open Data_Review, Analyze_Raw, Analyze_Processed and run all cells

# 3) Run modeling notebooks
jupyter notebook Models/     # open Classification, Regression, Kmeans and run all cells

# (Optional) Merge math/Portuguese records in R
Rscript scripts/student_merge.R

Existing Reports

Pre-generated plots and summaries are checked in under reports/ for quick reference:

EDA outputs: reports/Datasets/Raw/ and reports/Datasets/Processed/.
Classification metrics/plots: reports/Models/Classification/ plus classification_results_summary.csv.
Regression diagnostics: reports/Models/Regression/ plus regression_results_summary.csv.
K-means visuals and tables: reports/Models/KMeans/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Student Performance Analytics

Repository Overview

Data & Preprocessing

Exploratory Notebooks

Plot Capture Utility

Modeling Notebooks

Environment & Setup

Quickstart

Existing Reports

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.venv		.venv
Models		Models
data		data
docs		docs
notebooks		notebooks
reports		reports
scripts		scripts
src		src
Header.png		Header.png
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Student Performance Analytics

Repository Overview

Data & Preprocessing

Exploratory Notebooks

Plot Capture Utility

Modeling Notebooks

Environment & Setup

Quickstart

Existing Reports

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages