Skip to content

Credit Risk Modeling using Logistic Regression with imbalance handling

Notifications You must be signed in to change notification settings

joydeepgoswami/Credit-Risk-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Credit Risk Modeling

Overview

This project focuses on building a credit risk prediction model to identify whether a loan applicant is likely to default or not.

The objective is to support data-driven lending decisions and reduce default-related financial risk.

Dataset

The dataset contains loan application records with:

  • Applicant demographic information
  • Financial attributes
  • Credit history indicators

Target variable:

  • loan_status = 1 → Default
  • loan_status = 0 → Non-default

Approach

The following models were developed and compared:

  • Baseline Logistic Regression
  • Logistic Regression with Random Oversampling (ROS)
  • Logistic Regression with SMOTE

Class imbalance was explicitly handled to improve default detection.

Model Evaluation

Models were evaluated using:

  • Precision & Recall (Default class)
  • Macro-averaged metrics
  • ROC–AUC score

In credit risk problems, recall for defaulters is prioritised to minimise financial losses.

Results

  • Baseline model was conservative and missed many defaulters
  • Random oversampling improved recall but increased false positives
  • SMOTE provided the most balanced performance

Tools & Libraries

  • Python
  • Pandas, NumPy
  • Scikit-learn
  • Imbalanced-learn

Conclusion

The SMOTE-based logistic regression model provided the best balance between risk detection and overall stability, making it suitable for practical credit risk assessment.