🏦 Loan Approval Predictor

A professional Machine Learning system designed to predict bank loan approval decisions, featuring an interactive web interface built with Streamlit.

🎯 Project Objective

The goal of this project is to build an intelligent system capable of automatically predicting whether a loan application will be approved or rejected, based on:

Demographic information (age)
Financial data (annual income, requested loan amount, monthly deductions)
Product type and decision process
Historical banking decisions

This project simulates a real-world retail banking use case with production-oriented practices.

✨ Key Features

🤖 Machine Learning

6 machine learning algorithms implemented and compared
- Decision Tree
- Random Forest
- Extra Trees
- XGBoost
- LightGBM
- CatBoost
Automated hyperparameter optimization
Stratified cross-validation
Comprehensive evaluation metrics:
- Accuracy
- Precision
- Recall
- F1-score
- AUC-ROC

📊 Data Analysis & Feature Engineering

Automated Exploratory Data Analysis (EDA)
Advanced visualizations (20+ charts)
Feature engineering:
- Financial ratios
- Feature interactions
Outlier detection and data cleaning

🌐 Web Application (Streamlit)

Interactive dashboard for data exploration
Real-time loan approval prediction
Dynamic visualizations using Plotly
One-click model comparison

🛠️ Software Engineering

Modular and maintainable architecture
Unit testing with pytest
Fully documented codebase
Optimized VS Code configuration

📦 Installation

Prerequisites

Python 3.12+
pip
Git (optional)

Quick Setup

# 1. Clone the repository
git clone https://github.com/malek-harbaoui/Loan-Approval-Predictor.git
cd Loan_Approval_Predictor

# 2. Create a virtual environment
python -m venv venv

# 3. Activate the environment
# Windows
venv\Scripts\activate
# Linux / macOS
source venv/bin/activate

# 4. Install dependencies
pip install -r requirements.txt

# 5. Verify installation
python -c "import pandas, sklearn, xgboost; print('Installation OK')"

🚀 Usage

📊 Full ML Pipeline

# 1. Exploratory Data Analysis
python scripts/run_eda.py

# 2. Data preprocessing & feature engineering
python scripts/run_preprocessing.py

# 3. Train machine learning models
python scripts/train_models.py

# 4. Generate performance reports
python scripts/generate_report.py

🌐 Launch the Web Application

streamlit run app.py

Then open your browser at: 👉 http://localhost:8501

🎯 Programmatic Usage

from src.data.data_loader import DataLoader
from src.models.boosting_models import XGBoostModel
from sklearn.model_selection import train_test_split

# Load data
loader = DataLoader()
df = loader.load_retail_data()

# Prepare features and target
X = df.drop("Décision Finale Binaire", axis=1)
y = df["Décision Finale Binaire"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = XGBoostModel()
model.build_model(n_estimators=200, learning_rate=0.3)
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)

📁 Project Structure

Loan_Approval_Predictor/
│
├── app.py                    # 🌐 Streamlit application
├── requirements.txt          # 📦 Dependencies
├── config.yaml               # ⚙️ Global configuration
├── README.md                 # 📖 Documentation
│
├── data/
│   ├── raw/                  # Raw datasets
│   └── processed/            # Cleaned datasets
│
├── src/
│   ├── data/                 # Data processing
│   ├── models/               # ML models
│   ├── visualization/        # Charts & plots
│   └── evaluation/           # Metrics & evaluation
│
├── scripts/                  # Execution scripts
├── notebooks/                # Jupyter notebooks
├── tests/                    # Unit tests
├── models/                   # Saved models
└── reports/                  # Results & figures

📊 Expected Results

Model Performance

Model	Accuracy	Precision	Recall	F1-Score	AUC
XGBoost	0.924	0.918	0.931	0.924	0.957
LightGBM	0.921	0.915	0.928	0.921	0.954
CatBoost	0.918	0.912	0.925	0.918	0.951
Random Forest	0.915	0.909	0.922	0.915	0.948

🔍 Most Important Features

Retenus Mensuel - 35.2%
Revenus Annuel - 28.7%
Montant Sollicité - 18.4%
Age - 9.3%
Type CANEVAS - 4.8%

🎨 Streamlit Interface

Available Pages

🏠 Home
- Dataset overview
- Key statistics
- Model performance indicators
📊 Data Exploration
- Feature distributions
- Bivariate analysis
- Correlation matrix
- Interactive visualizations
🎯 Prediction
- User input form
- Real-time loan approval prediction
- Decision analysis
- Explanation of influencing factors
📈 Model Results
- Model performance comparison
- Performance charts
- Detailed evaluation metrics
- Result visualizations

🛠️ Technologies Used

Core ML

scikit-learn
XGBoost
LightGBM
CatBoost

Data Processing

pandas
numpy

Visualization

matplotlib
seaborn
plotly

Web Application

Streamlit
Streamlit-Plotly

Development Tools

pytest
black
flake8

🔧 Global Configuration (`config.yaml`)

project:
  name: "Loan Approval Predictor"
  version: "1.0.0"
  random_seed: 42

data:
  test_size: 0.2
  target_column: "Décision Finale Binaire"

models:
  xgboost:
    n_estimators: 200
    learning_rate: 0.3
    max_depth: 4

🔐 Environment Variables

Create a .env file at the project root:

DATA_PATH=data/raw
MODEL_PATH=models
RANDOM_STATE=42

📊 Evaluation Metrics

Le projet utilise plusieurs métriques pour évaluer les modèles :

Accuracy – Overall correctness of predictions
Precision – Proportion of correct positive predictions
Recall - Ability to detect positive cases
F1-Score - Harmonic mean of Precision and Recall
AUC-ROC - Area under the ROC curve
Specificity - Ability to detect negative cases

🤝 Contributing

Contributions are welcome! To contribute to this project:

Fork the repository
Create a new branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

👥 Auteur

Malek Harbaoui - Main Developer

⭐ If this project helped you, please consider giving it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
catboost_info		catboost_info
data		data
logs		logs
models		models
reports		reports
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.yaml		config.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🏦 Loan Approval Predictor

🎯 Project Objective

✨ Key Features

🤖 Machine Learning

📊 Data Analysis & Feature Engineering

🌐 Web Application (Streamlit)

🛠️ Software Engineering

📦 Installation

Prerequisites

Quick Setup

🚀 Usage

📊 Full ML Pipeline

🌐 Launch the Web Application

🎯 Programmatic Usage

📁 Project Structure

📊 Expected Results

Model Performance

🔍 Most Important Features

🎨 Streamlit Interface

Available Pages

🛠️ Technologies Used

Core ML

Data Processing

Visualization

Web Application

Development Tools

🔧 Global Configuration (config.yaml)

🔐 Environment Variables

📊 Evaluation Metrics

🤝 Contributing

👥 Auteur

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🔧 Global Configuration (`config.yaml`)

Packages