Exploring Credit Scoring Patterns: Using Statistical Modeling with a Logistic Regression Approach
Credit scoring plays a central role in financial decision-making, determining who can borrow money and at what cost. For newcomers to the U.S. financial system, such as international students , the process by which credit scores are calculated can be confusing. This study investigates the statistical factors behind credit scores by analyzing the credit score dataset using logistic regression. The objective was to identify which personal and financial traits, such as income, age, debt ratio, and history of late or missed payments, most affect a person’s credit standing.
The workflow includes data cleaning, exploratory data analysis, correlation assessment, feature scaling, and model evaluation using AUC, KS statistics, and confusion matrices. Logistic regression was selected for this project due to easy interpretability and regulatory transparency as it assumes a monotonic relationship between the predictors and the log-odds of the outcome , allowing clear identification of the factors most influencing default probability. Predicted default probabilities were then transformed into credit scores using a log-odds–based scoring formula.
The findings highlight that payment delinquency (no or late payments) metrics and income status are the strongest predictors of repayment risk or credit score. This work demonstrates how statistical modeling provides transparency into credit scoring systems and bridges theoretical understanding with practical credit risk evaluation.