Financial institutions incur significant losses when borrowers default on loans. This project analyzes historical loan data to identify patterns associated with high-risk borrowers. By building a predictive machine learning model, the project aims to automate the loan eligibility process and suggest specific risk mitigation strategies to reduce the bank's non-performing assets (NPA).
Loan Default Prediction & Risk Strategy.ipynb: The technical Jupyter Notebook containing data cleaning, EDA, feature engineering, and model training.Loan Default Prediction & Risk Strategy.pdf: A comprehensive report summarizing the findings, visualizations, and business recommendations.
- Language: Python 3.x
- Data Processing: Pandas, NumPy
- Visualization: Matplotlib, Seaborn
- Machine Learning: Scikit-Learn (Logistic Regression, Random Forest, Decision Trees)
- Metrics: Precision, Recall, F1-Score, ROC-AUC (Focus on minimizing False Negatives)
- Handled missing values in demographic and financial columns.
- Detected and treated outliers in
IncomeandLoan Amount. - Analyzed class imbalance (Defaults vs. Non-Defaults).
- Financial Health: Analyzed Debt-to-Income (DTI) ratios and their correlation with default rates.
- Demographics: Examined the impact of employment length, home ownership, and education level on repayment behavior.
- Loan Characteristics: Investigated interest rates and loan grades assigned by the bank.
Different classification models were tested to predict the binary outcome (Default / No Default):
- Logistic Regression: Used as a baseline for interpretability.
- Random Forest Classifier: Implemented to capture non-linear relationships and feature importance.
- Model Optimization: Tuned hyperparameters to improve Recall (Sensitivity), as missing a defaulter is more costly than flagging a good borrower.
Based on the analysis, the following strategies are recommended:
- Strict DTI Thresholds: Applicants with a Debt-to-Income ratio above a certain threshold (e.g., 40%) show a significantly higher probability of default.
- Interest Rate Adjustment: High-risk profiles identified by the model should be offered loans at adjusted interest rates to cover potential losses (Risk-Based Pricing).
- Enhanced Verification: employment verification should be mandatory for applicants requesting high loan amounts with short credit histories.
- Clone the repository:
git clone: https://github.com/amit95ranjan/Loan-Default-Prediction-Risk-Strategy
- Install dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn
- Run the Notebook:
jupyter notebook "Loan Default Prediction & Risk Strategy.ipynb"
Created by Amit Ranjan - Data Analyst