A regulatory-compliant credit scoring system designed to address selection bias inherent in historical lending data. The system produces calibrated probability estimates, satisfies explainability requirements for regulatory compliance, and demonstrates fairness across protected demographic classes.
Traditional credit scoring models suffer from a fundamental selection bias: only approved applicants have observable outcomes (default/non-default). Rejected applicants represent missing data that, if ignored, leads to biased model training and miscalibrated risk estimates. This system implements reject inference techniques using semi-supervised learning to recover information from the rejected population.
- Reject Inference: Semi-supervised learning using EM algorithm and pseudo-labeling to infer outcomes for rejected applicants
- Weight of Evidence Encoding: Regularized WoE transformation for categorical variables with smoothing to handle sparse categories
- Scorecard Binning: Optimal binning with monotonicity constraints ensuring interpretable credit score relationships
- Probability Calibration: Platt scaling and beta calibration for reliable probability estimates
- Fairness Testing: Disparate impact analysis with bias mitigation through reweighting and threshold adjustment
- Regulatory Documentation: Automated generation of model documentation artifacts for compliance review
The system follows a principled approach to credit risk modeling:
- Data Preprocessing: Handle missing values, encode categoricals with WoE, and create optimal bins
- Reject Inference: Apply EM algorithm to incorporate rejected applicant information
- Model Training: Logistic regression with monotonicity constraints for interpretability
- Calibration: Post-hoc calibration using held-out validation data
- Fairness Audit: Test for disparate impact across protected classes with mitigation options
- Documentation: Generate regulatory artifacts including model cards and variable importance reports
credit-scoring-reject-inference/
├── src/
│ ├── preprocessing/ # WoE encoding, binning, missing value handling
│ ├── reject_inference/ # EM algorithm, pseudo-labeling
│ ├── modeling/ # Constrained logistic regression, calibration
│ ├── fairness/ # Disparate impact testing, bias mitigation
│ └── documentation/ # Regulatory artifact generation
├── tests/ # Unit and integration tests
├── notebooks/ # Exploratory analysis and demos
├── docs/ # Implementation plan and design documents
└── data/ # Sample datasets (synthetic for privacy)
- Python 3.9+
- NumPy, Pandas, Scikit-learn
- SciPy (for optimization)
- Matplotlib, Seaborn (for visualization)
git clone https://github.com/Sakeeb91/credit-scoring-reject-inference.git
cd credit-scoring-reject-inference
pip install -r requirements.txtMIT License
Sakeeb Rahman