This project focuses on building a credit risk prediction model to identify whether a loan applicant is likely to default or not.
The objective is to support data-driven lending decisions and reduce default-related financial risk.
The dataset contains loan application records with:
- Applicant demographic information
- Financial attributes
- Credit history indicators
Target variable:
loan_status = 1→ Defaultloan_status = 0→ Non-default
The following models were developed and compared:
- Baseline Logistic Regression
- Logistic Regression with Random Oversampling (ROS)
- Logistic Regression with SMOTE
Class imbalance was explicitly handled to improve default detection.
Models were evaluated using:
- Precision & Recall (Default class)
- Macro-averaged metrics
- ROC–AUC score
In credit risk problems, recall for defaulters is prioritised to minimise financial losses.
- Baseline model was conservative and missed many defaulters
- Random oversampling improved recall but increased false positives
- SMOTE provided the most balanced performance
- Python
- Pandas, NumPy
- Scikit-learn
- Imbalanced-learn
The SMOTE-based logistic regression model provided the best balance between risk detection and overall stability, making it suitable for practical credit risk assessment.