Skip to content

Regulatory-compliant credit scoring system with reject inference using semi-supervised learning, calibrated probabilities, and fairness testing across protected classes

Notifications You must be signed in to change notification settings

Sakeeb91/credit-scoring-reject-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Credit Scoring Model with Reject Inference

A regulatory-compliant credit scoring system designed to address selection bias inherent in historical lending data. The system produces calibrated probability estimates, satisfies explainability requirements for regulatory compliance, and demonstrates fairness across protected demographic classes.

Problem Statement

Traditional credit scoring models suffer from a fundamental selection bias: only approved applicants have observable outcomes (default/non-default). Rejected applicants represent missing data that, if ignored, leads to biased model training and miscalibrated risk estimates. This system implements reject inference techniques using semi-supervised learning to recover information from the rejected population.

Core Features

  • Reject Inference: Semi-supervised learning using EM algorithm and pseudo-labeling to infer outcomes for rejected applicants
  • Weight of Evidence Encoding: Regularized WoE transformation for categorical variables with smoothing to handle sparse categories
  • Scorecard Binning: Optimal binning with monotonicity constraints ensuring interpretable credit score relationships
  • Probability Calibration: Platt scaling and beta calibration for reliable probability estimates
  • Fairness Testing: Disparate impact analysis with bias mitigation through reweighting and threshold adjustment
  • Regulatory Documentation: Automated generation of model documentation artifacts for compliance review

Technical Approach

The system follows a principled approach to credit risk modeling:

  1. Data Preprocessing: Handle missing values, encode categoricals with WoE, and create optimal bins
  2. Reject Inference: Apply EM algorithm to incorporate rejected applicant information
  3. Model Training: Logistic regression with monotonicity constraints for interpretability
  4. Calibration: Post-hoc calibration using held-out validation data
  5. Fairness Audit: Test for disparate impact across protected classes with mitigation options
  6. Documentation: Generate regulatory artifacts including model cards and variable importance reports

Project Structure

credit-scoring-reject-inference/
├── src/
│   ├── preprocessing/       # WoE encoding, binning, missing value handling
│   ├── reject_inference/    # EM algorithm, pseudo-labeling
│   ├── modeling/            # Constrained logistic regression, calibration
│   ├── fairness/            # Disparate impact testing, bias mitigation
│   └── documentation/       # Regulatory artifact generation
├── tests/                   # Unit and integration tests
├── notebooks/               # Exploratory analysis and demos
├── docs/                    # Implementation plan and design documents
└── data/                    # Sample datasets (synthetic for privacy)

Requirements

  • Python 3.9+
  • NumPy, Pandas, Scikit-learn
  • SciPy (for optimization)
  • Matplotlib, Seaborn (for visualization)

Installation

git clone https://github.com/Sakeeb91/credit-scoring-reject-inference.git
cd credit-scoring-reject-inference
pip install -r requirements.txt

License

MIT License

Author

Sakeeb Rahman

About

Regulatory-compliant credit scoring system with reject inference using semi-supervised learning, calibrated probabilities, and fairness testing across protected classes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages