GitHub - Atikahdr/Elevvo_ML_StudentPerformanceFactors: This dataset provides a comprehensive overview of various factors affecting student performance in exams. It includes information on study habits, attendance, parental involvement, and other aspects influencing academic success. https://www.kaggle.com/datasets/lainguyn123/student-performance-factors/data

🎓 Student Performance Classification

🚀 Machine Learning Project | Model Evaluation + Polynomial Features + SHAP Interpretation

🌟 Level-1 → Task 1 + Bonus Completed ✅

https://elevvomlstudentperformancefactors-polymonialregression.streamlit.app/

📌 Task Description

Build a model to predict students' exam scores based on their study hours Perform data cleaning and basic visualization to understand the dataset Split the dataset into training and testing sets Train a linear regression model to estimate final scores Visualize predictions and evaluate model performance

📂 Dataset

Source: Student Performance Classification
File Used: StudentPerformanceFactors.csv

Target Variable: Score_Category (Categorized from Exam_Score into performance levels)

 Features Include:
 Attendance
 Hours_Studied
 Previous_Scores
 Sleep_Hours
 Physical_Activity
 Parental_Involvement
 Motivation_Level
 Family_Income
 Teacher_Quality
 Gender
 etc.

🧰 Tools & Libraries Used

🐍 Python
📊 Pandas & NumPy
📈 Matplotlib & Seaborn
📊 Scikit-learn
📊 SHAP (Model Explainability)

🔄 Project Workflow

✅ 1. Data Overview & Cleaning

Checked missing values
Verified data types
Converted categorical features
Created Score_Category as classification target

✅ 2. Exploratory Data Analysis (EDA)

📈 Visualization of the distribution

Histogram
Matrix Correlation
Boxplot
Scatter plot + Regression Line

📊 Statistical Testing

ANOVA → for numerical vs target
Chi-Square → for categorical vs target

📌 Insight: Attendance and Previous Scores showed strong statistical significance.

✅ 3. Feature Engineering

Academic Effort Score
Academic Effort Score
Learning Momentum
Academic Risk Index

✔ Combines related indicators into interpretable composite scores

✅ 4. Encoding & Scaling

One-Hot Encoding for categorical features
StandardScaler for numerical normalization

✅ 5. Model Testing (6 Algorithms Compared)

Models Evaluated:

Model Accuracy	Accuracy Mean	ROC-AUC Mean
Logistic Regression	95.02	98.99
SVM	91.64	98.22
Gradient Boosting	85.18	95.20
Random Forest	80.57	93.13
KNN	73.55	88.70
Decision Tree	72.85	78.98

📌 Best Baseline Model: Logistic Regression 💡

✅ 6. Feature Importance - Logistic Regression

Selected top k most significant features before modeling.

Attendance
Motivation_Level_Low
Hours_Studied
Previous_Scores

📌 The primary determinants of academic performance.

✅ 7. Hyperparameter Tuning

Used GridSearchCV on Logistic Regression.

🔍 Best Parameters:

C = 100
Penalty = L1
Solver = saga
Class_weight = balanced

Best CV Accuracy: 95.34%

📌 L1 regularization helped in automatic feature selection.

✅ 8. Polynomial Feature

Adding Polynomial Features (degree=2)
Training accuracy (99.02%)
Test Accuracy (91.07%),
(Indicating overfitting))

📌The linear Logistic Regression model without Polynomial transformation provides a better bias-variance balance for this dataset.

📊 Final Model Performance

ROC-AUC (Multiclass OVR) evaluated
Balanced class handling applied

✅ 9. Model Interpretation (SHAP Analysis)

Applied SHAP to understand feature contribution.

🔎 Key Findings:

Attendance has the highest positive impact
Academic Effort & Study Hours, Study habits coaching program
Previous Scores, Remedial program based on previous performance

📌 SHAP confirms statistical and modeling results.

📊 Business Insight

✔ Improving attendance discipline
✔ Optimizing learning hours and quality
✔ Remedial programs based on previous grades
✔ Increasing access to learning resources
✔ L1 regularization enhances interpretability
✔ Strengthening parental support

In conclusion, interventions based on study habits and environmental support will have the greatest impact on improving student performance compared to demographic factors alone.

📚 Concepts Covered

📊 Statistical Hypothesis Testing
🔺 Polynomial Feature Engineering
📊 Feature Selection (ANOVA)
🤖 Classification Modeling
⚙ Hyperparameter Tuning
📈 Model Comparison
📉 Multiclass ROC-AUC
🧠 Explainable AI (SHAP)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
Student.jpg		Student.jpg
StudentPerformanceFactors.csv		StudentPerformanceFactors.csv
Student_Performance_Factors_ML.ipynb		Student_Performance_Factors_ML.ipynb
app.py		app.py
lr_poly.pkl		lr_poly.pkl
poly.pkl		poly.pkl
preprocessor.pkl		preprocessor.pkl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 Student Performance Classification

🚀 Machine Learning Project | Model Evaluation + Polynomial Features + SHAP Interpretation

📌 Task Description

📂 Dataset

🧰 Tools & Libraries Used

🔄 Project Workflow

📊 Final Model Performance

📊 Business Insight

📚 Concepts Covered

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎓 Student Performance Classification

🚀 Machine Learning Project | Model Evaluation + Polynomial Features + SHAP Interpretation

📌 Task Description

📂 Dataset

🧰 Tools & Libraries Used

🔄 Project Workflow

📊 Final Model Performance

📊 Business Insight

📚 Concepts Covered

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages