This repository contains the code, experiments, and infrastructure for the Kaggle Playground Series – Season 5, Episode 11 competition. The task is to predict whether a borrower will successfully repay a loan using a realistic, synthetic binary-classification dataset.
- Synthetic dataset generated from a deep-learning model trained on real loan-prediction data.
- Goal: predict loan repayment based on borrower and loan attributes (e.g., income, debt ratio, credit history, loan purpose).
- This repo implements an end-to-end ML workflow: data ingestion → EDA → feature engineering → modeling → evaluation → submission.
| Category | Tools / Packages |
|---|---|
| Language | Python |
| Experiments | Marimo notebooks |
| Models | CatBoost, XGBoost, LGBM, Random Forest |
| Hyperparameter Tuning | Optuna |
| Tracking | MLflow |
| Reproducibility | Docker, shell scripts |
The target metric was the area under the ROC curve. Below are histograms of the leaderboard scores. The left plot shows all scores whereas the right plot zooms in by just displaying scores above 0.9:
| All Leaderboard Scores | Scores above 0.9 |
|---|---|
![]() |
![]() |

