Skip to content

pranotosh2/Credit-Risk-Classification-Using-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

This project builds a credit approval prediction system using machine learning models. It processes customer loan application data, performs feature selection, trains multiple ML models, and tunes hyperparameters using a custom Grid Search where the test set is used as a validation set.

The goal is to classify applicants into four approval categories (P1, P2, P3, P4) to support risk-based lending decisions. Here used two dataset one from CBIL dataset (51336, 54) and internal bank dataset (51296, 26) with same "PPROSPECTID"

EDA

Remove the null values from two datasets and also remove those columns which hav more than 10k null values

Feature Engineering

Divided the dataset into categorical and numerical columns. How the categorical columns associated with target column by chi2 test with p-value <=0.05. In the numerical columns use sequential VIF (Variation Inflation Factor) = 6 to check multicolinearity. And again test ANOVA with numerical columns and different class and set p-value as 0.05. Used label encoding(EDUCATION) and one hot encoding on categorical colums (GENDER ,MARITALSTATUS etc).

Machine Learning Model (XGBOOST, RANDOMFOREST, DECISIONTREE)

XGBOOST gave the maximum accuracy approx 78%.

Hyperparameters Tuning On XGBOOST

param_grid= { 'colsample_bytree':[0.1,0.3,0.5,0.7,0.9], 'learning_rate':[0.001,0.01,0.1,1], 'max_depth':[3,5,8,10], 'alpha':[1,10,100], 'n_estimators':[10,50,100]

}

Best parameters are

Train Accuracy: 0.8055927015541886 Test Accuracy: 0.7801022227505052 colsample_bytree: 0.3 learning_rate: 1 max_depth: 3 alpha: 10 n_estimators: 100

Releases

No releases published

Packages

 
 
 

Contributors