Skip to content

Latest commit

 

History

History
36 lines (31 loc) · 1.59 KB

File metadata and controls

36 lines (31 loc) · 1.59 KB

This project builds a credit approval prediction system using machine learning models. It processes customer loan application data, performs feature selection, trains multiple ML models, and tunes hyperparameters using a custom Grid Search where the test set is used as a validation set.

The goal is to classify applicants into four approval categories (P1, P2, P3, P4) to support risk-based lending decisions. Here used two dataset one from CBIL dataset (51336, 54) and internal bank dataset (51296, 26) with same "PPROSPECTID"

EDA

Remove the null values from two datasets and also remove those columns which hav more than 10k null values

Feature Engineering

Divided the dataset into categorical and numerical columns. How the categorical columns associated with target column by chi2 test with p-value <=0.05. In the numerical columns use sequential VIF (Variation Inflation Factor) = 6 to check multicolinearity. And again test ANOVA with numerical columns and different class and set p-value as 0.05. Used label encoding(EDUCATION) and one hot encoding on categorical colums (GENDER ,MARITALSTATUS etc).

Machine Learning Model (XGBOOST, RANDOMFOREST, DECISIONTREE)

XGBOOST gave the maximum accuracy approx 78%.

Hyperparameters Tuning On XGBOOST

param_grid= { 'colsample_bytree':[0.1,0.3,0.5,0.7,0.9], 'learning_rate':[0.001,0.01,0.1,1], 'max_depth':[3,5,8,10], 'alpha':[1,10,100], 'n_estimators':[10,50,100]

}

Best parameters are

Train Accuracy: 0.8055927015541886 Test Accuracy: 0.7801022227505052 colsample_bytree: 0.3 learning_rate: 1 max_depth: 3 alpha: 10 n_estimators: 100