Skip to content

This notebook analyzes and predicts student academic performance using a linear regression model on the "Student Performance" dataset (student-mat.csv). It includes data preprocessing, exploratory data analysis (EDA), model training, evaluation, and prediction for new student data.

Notifications You must be signed in to change notification settings

GreatTitanDev/Student_performence_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

🎓 Student Performance Prediction with Linear Regression

📝 Short Description

This project analyzes and predicts student academic performance using a linear regression model on the Student Performance Dataset (student-mat.csv).
It includes data preprocessing, exploratory data analysis (EDA), model training, evaluation, and prediction for new student data.


🚀 Features

  • Data Preprocessing:
    • Encodes categorical variables into numeric form.
    • Checks and handles missing data.
  • Exploratory Data Analysis:
    • Histograms, boxplots, and lineplots for grade distribution and relationships.
    • Correlation heatmap of numerical features.
  • Modeling:
    • Splits data into training/testing sets.
    • Trains a LinearRegression model using scikit-learn.
    • Evaluates performance with MSE and R² score.
  • Prediction:
    • Predicts grades for hypothetical students.
  • Model Saving:
    • Exports trained model with joblib for future use.

📦 Requirements

Install dependencies:

pip install pandas matplotlib seaborn scikit-learn joblib

📂 Dataset

The dataset contains demographic, social, and academic attributes for students.
Target variable: G3 — the final grade.


⚙️ Usage

  1. Place student-mat.csv in the project directory.
  2. Run the notebook or script to:
    • Explore data
    • Train the model
    • Make predictions
  3. Example prediction:
prediction = model.predict(new_data)
print(f'Predicted final grade: {prediction[0]:.1f}')
  1. The trained model is saved as new_model.joblib.

📊 Outputs

  • Visualizations:
    • Grade distribution histogram
    • Boxplots (by school, gender, alcohol consumption, study time, etc.)
    • Correlation heatmap
  • Metrics:
    • Mean Squared Error (MSE)
    • R² score
  • Saved Model:
    • File: new_model.joblib

🔍 Workflow

  1. Data Loading — Load student-mat.csv into a Pandas DataFrame and inspect structure.
  2. Preprocessing — Encode categorical variables into integers and prepare X (features) & y (target).
  3. EDA — Use Seaborn/Matplotlib for distribution plots, boxplots, and correlation analysis.
  4. Model Training — Split data (80% train, 20% test), fit a LinearRegression model.
  5. Evaluation — Predict on the test set and compute MSE & R² score.
  6. Prediction — Create new student profile → predict final grade.
  7. Model Saving — Save trained model with joblib.dump() for reuse.

Author: Nimona Engida
Dataset Source: UCI Machine Learning Repository — Student Performance

About

This notebook analyzes and predicts student academic performance using a linear regression model on the "Student Performance" dataset (student-mat.csv). It includes data preprocessing, exploratory data analysis (EDA), model training, evaluation, and prediction for new student data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published