Predicting Loan Payback (Kaggle Playground Series S5E11)

📌 Project Overview

This project focuses on predicting whether a loan will be paid back using structured tabular data. The solution was developed entirely with Python scripts (.py), without notebooks, following a clean and reproducible end‑to‑end machine learning pipeline.

The project is based on the Kaggle competition Playground Series – Season 5, Episode 11 and is designed as a portfolio‑ready Data Science / Machine Learning project.

🧠 Problem Statement

Given customer financial and demographic data, predict the probability that a loan will be paid back (loan_paid_back).

This is a binary classification problem, evaluated using ROC AUC.

🗂️ Project Structure

playground-series-s5e11/
│
├── data/                 # Raw competition data (train.csv, test.csv)
├── outputs/              # EDA reports, metrics, trained models, submission
│   ├── eda_report.txt
│   ├── metrics.txt
│   ├── catboost_model.cbm
│   └── submission.csv
│
├── src/                  # Source code (pure Python, no notebooks)
│   ├── config.py
│   ├── load_data.py
│   ├── eda.py
│   ├── train_catboost.py
│   └── inference.py
│
├── requirements.txt
└── README.md

⚙️ Tech Stack

Python 3.11+
Pandas, NumPy
Scikit‑learn
CatBoost (native categorical feature handling)

🔍 Exploratory Data Analysis (EDA)

EDA is performed via a standalone Python script and saved as a text report:

dataset shapes
column overview
target distribution
missing values check
data types

Output:

outputs/eda_report.txt

🤖 Model

CatBoostClassifier was selected due to:

native handling of categorical features
strong performance on tabular data
minimal preprocessing requirements
robustness and stability

Categorical Features

['gender', 'marital_status', 'education_level',
 'employment_status', 'loan_purpose', 'grade_subgrade']

📊 Results

Metric	Score
Public Leaderboard AUC	0.92293
Private Leaderboard AUC	0.92385
OOF AUC	0.92338 ± 0.00069

These results indicate a stable and well‑generalizing model.

▶️ How to Run Locally

1️⃣ Create virtual environment

python -m venv venv
venv\Scripts\activate

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Run EDA

python -m src.eda

4️⃣ Train model

python -m src.train_catboost

5️⃣ Generate submission

python -m src.inference

🧪 Key Design Decisions

❌ No Jupyter notebooks
✅ Script‑based, reproducible pipeline
✅ Clear separation of concerns (EDA / training / inference)
✅ Local development (VSC‑friendly)
✅ Ready for extension (SHAP, feature engineering, hyperparameter tuning)

🚀 Future Improvements

Feature engineering (ratio & interaction features)
SHAP‑based model interpretability
Hyperparameter optimization
Model ensembling

📎 Kaggle

Competition: Playground Series S5E11

Submission performed via Late Submission (learning & portfolio purposes).

👤 Author

Grzegorz

Focused on Data Science and Machine Learning with emphasis on clean pipelines, reproducibility, and production‑ready code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting Loan Payback (Kaggle Playground Series S5E11)

📌 Project Overview

🧠 Problem Statement

🗂️ Project Structure

⚙️ Tech Stack

🔍 Exploratory Data Analysis (EDA)

🤖 Model

Categorical Features

📊 Results

▶️ How to Run Locally

1️⃣ Create virtual environment

2️⃣ Install dependencies

3️⃣ Run EDA

4️⃣ Train model

5️⃣ Generate submission

🧪 Key Design Decisions

🚀 Future Improvements

📎 Kaggle

👤 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
outputs		outputs
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Coltrane35/predicting-loan-payback

Folders and files

Latest commit

History

Repository files navigation

Predicting Loan Payback (Kaggle Playground Series S5E11)

📌 Project Overview

🧠 Problem Statement

🗂️ Project Structure

⚙️ Tech Stack

🔍 Exploratory Data Analysis (EDA)

🤖 Model

Categorical Features

📊 Results

▶️ How to Run Locally

1️⃣ Create virtual environment

2️⃣ Install dependencies

3️⃣ Run EDA

4️⃣ Train model

5️⃣ Generate submission

🧪 Key Design Decisions

🚀 Future Improvements

📎 Kaggle

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages