This project analyzes and predicts student academic performance using a linear regression model on the Student Performance Dataset (student-mat.csv).
It includes data preprocessing, exploratory data analysis (EDA), model training, evaluation, and prediction for new student data.
- Data Preprocessing:
- Encodes categorical variables into numeric form.
- Checks and handles missing data.
- Exploratory Data Analysis:
- Histograms, boxplots, and lineplots for grade distribution and relationships.
- Correlation heatmap of numerical features.
- Modeling:
- Splits data into training/testing sets.
- Trains a
LinearRegressionmodel usingscikit-learn. - Evaluates performance with MSE and R² score.
- Prediction:
- Predicts grades for hypothetical students.
- Model Saving:
- Exports trained model with
joblibfor future use.
- Exports trained model with
Install dependencies:
pip install pandas matplotlib seaborn scikit-learn joblibThe dataset contains demographic, social, and academic attributes for students.
Target variable: G3 — the final grade.
- Place
student-mat.csvin the project directory. - Run the notebook or script to:
- Explore data
- Train the model
- Make predictions
- Example prediction:
prediction = model.predict(new_data)
print(f'Predicted final grade: {prediction[0]:.1f}')- The trained model is saved as
new_model.joblib.
- Visualizations:
- Grade distribution histogram
- Boxplots (by school, gender, alcohol consumption, study time, etc.)
- Correlation heatmap
- Metrics:
- Mean Squared Error (MSE)
- R² score
- Saved Model:
- File:
new_model.joblib
- File:
- Data Loading — Load
student-mat.csvinto a Pandas DataFrame and inspect structure. - Preprocessing — Encode categorical variables into integers and prepare
X(features) &y(target). - EDA — Use Seaborn/Matplotlib for distribution plots, boxplots, and correlation analysis.
- Model Training — Split data (80% train, 20% test), fit a
LinearRegressionmodel. - Evaluation — Predict on the test set and compute MSE & R² score.
- Prediction — Create new student profile → predict final grade.
- Model Saving — Save trained model with
joblib.dump()for reuse.
Author: Nimona Engida
Dataset Source: UCI Machine Learning Repository — Student Performance