This project aims to develop a predictive model for calculating the "Behavior Score" of credit card customers for Bank A. The score predicts the probability of a customer defaulting on their credit card, enabling robust risk management for the bank's existing portfolio.
The task is to build a behavior score using historical development data, predicting the probability of credit card defaults. The model will also be used to predict default probabilities on validation data for operational implementation.
- Development Data: 96,806 credit card records with
bad_flagindicating defaults. - Validation Data: 41,792 credit card records without
bad_flag, for probability prediction. - Features Include:
- On-us attributes (credit limit, etc.).
- Transaction-level attributes (transaction counts, merchant data).
- Bureau tradeline attributes (historical delinquencies, holdings).
- Bureau enquiry attributes (e.g., past enquiries).
- Data Preprocessing:
- Handled missing values and standardized features.
- Addressed data imbalance using SMOTE.
- Exploratory Data Analysis (EDA):
- Analyzed feature correlations and distributions.
- Investigated default patterns across customer segments.
- Model Development:
- Used Random Forest, LightGBM, and Gradient Boosting.
- Conducted hyperparameter tuning using GridSearchCV.
- Model Evaluation:
- Evaluated models on metrics like AUC-ROC, PR-AUC, F1-score, G-Mean, and Matthews Correlation Coefficient.
- Validated on unseen data, ensuring predicted probabilities were between 0 and 1.
- Developed a high-performing predictive model for credit card default risk.
- Provided actionable insights into customer behavior and portfolio risk.
- Code: Python scripts for data preprocessing, modeling, and evaluation.
- Insights: Key findings from EDA and model interpretations.
- Documentation: Detailed steps, metrics, and observations.
- Clone the repository.
- Install required libraries using
requirements.txt. - Run the scripts to preprocess data, train the model, and make predictions.
- Threshold-Independent: AUC-ROC, PR-AUC.
- Threshold-Dependent: F1-Score, Confusion Matrix.
- Business Metrics: Weighted Cost Analysis.
- Class Imbalance: G-Mean, Matthews Correlation Coefficient, Cohen's Kappa.
This project delivers a robust behavior score model, enabling Bank A to manage credit risk effectively. The insights and predictions support strategic decision-making and portfolio optimization.