A professional Machine Learning system designed to predict bank loan approval decisions, featuring an interactive web interface built with Streamlit.
The goal of this project is to build an intelligent system capable of automatically predicting whether a loan application will be approved or rejected, based on:
- Demographic information (age)
- Financial data (annual income, requested loan amount, monthly deductions)
- Product type and decision process
- Historical banking decisions
This project simulates a real-world retail banking use case with production-oriented practices.
- 6 machine learning algorithms implemented and compared
- Decision Tree
- Random Forest
- Extra Trees
- XGBoost
- LightGBM
- CatBoost
- Automated hyperparameter optimization
- Stratified cross-validation
- Comprehensive evaluation metrics:
- Accuracy
- Precision
- Recall
- F1-score
- AUC-ROC
- Automated Exploratory Data Analysis (EDA)
- Advanced visualizations (20+ charts)
- Feature engineering:
- Financial ratios
- Feature interactions
- Outlier detection and data cleaning
- Interactive dashboard for data exploration
- Real-time loan approval prediction
- Dynamic visualizations using Plotly
- One-click model comparison
- Modular and maintainable architecture
- Unit testing with pytest
- Fully documented codebase
- Optimized VS Code configuration
- Python 3.12+
- pip
- Git (optional)
# 1. Clone the repository
git clone https://github.com/malek-harbaoui/Loan-Approval-Predictor.git
cd Loan_Approval_Predictor
# 2. Create a virtual environment
python -m venv venv
# 3. Activate the environment
# Windows
venv\Scripts\activate
# Linux / macOS
source venv/bin/activate
# 4. Install dependencies
pip install -r requirements.txt
# 5. Verify installation
python -c "import pandas, sklearn, xgboost; print('Installation OK')"# 1. Exploratory Data Analysis
python scripts/run_eda.py
# 2. Data preprocessing & feature engineering
python scripts/run_preprocessing.py
# 3. Train machine learning models
python scripts/train_models.py
# 4. Generate performance reports
python scripts/generate_report.pystreamlit run app.pyThen open your browser at: π http://localhost:8501
from src.data.data_loader import DataLoader
from src.models.boosting_models import XGBoostModel
from sklearn.model_selection import train_test_split
# Load data
loader = DataLoader()
df = loader.load_retail_data()
# Prepare features and target
X = df.drop("DΓ©cision Finale Binaire", axis=1)
y = df["DΓ©cision Finale Binaire"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = XGBoostModel()
model.build_model(n_estimators=200, learning_rate=0.3)
model.fit(X_train, y_train)
# Predict
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)Loan_Approval_Predictor/
β
βββ app.py # π Streamlit application
βββ requirements.txt # π¦ Dependencies
βββ config.yaml # βοΈ Global configuration
βββ README.md # π Documentation
β
βββ data/
β βββ raw/ # Raw datasets
β βββ processed/ # Cleaned datasets
β
βββ src/
β βββ data/ # Data processing
β βββ models/ # ML models
β βββ visualization/ # Charts & plots
β βββ evaluation/ # Metrics & evaluation
β
βββ scripts/ # Execution scripts
βββ notebooks/ # Jupyter notebooks
βββ tests/ # Unit tests
βββ models/ # Saved models
βββ reports/ # Results & figures
| Model | Accuracy | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|
| XGBoost | 0.924 | 0.918 | 0.931 | 0.924 | 0.957 |
| LightGBM | 0.921 | 0.915 | 0.928 | 0.921 | 0.954 |
| CatBoost | 0.918 | 0.912 | 0.925 | 0.918 | 0.951 |
| Random Forest | 0.915 | 0.909 | 0.922 | 0.915 | 0.948 |
- Retenus Mensuel - 35.2%
- Revenus Annuel - 28.7%
- Montant SollicitΓ© - 18.4%
- Age - 9.3%
- Type CANEVAS - 4.8%
-
π Home
- Dataset overview
- Key statistics
- Model performance indicators
-
π Data Exploration
- Feature distributions
- Bivariate analysis
- Correlation matrix
- Interactive visualizations
-
π― Prediction
- User input form
- Real-time loan approval prediction
- Decision analysis
- Explanation of influencing factors
-
π Model Results
- Model performance comparison
- Performance charts
- Detailed evaluation metrics
- Result visualizations
- scikit-learn
- XGBoost
- LightGBM
- CatBoost
- pandas
- numpy
- matplotlib
- seaborn
- plotly
- Streamlit
- Streamlit-Plotly
- pytest
- black
- flake8
project:
name: "Loan Approval Predictor"
version: "1.0.0"
random_seed: 42
data:
test_size: 0.2
target_column: "DΓ©cision Finale Binaire"
models:
xgboost:
n_estimators: 200
learning_rate: 0.3
max_depth: 4Create a .env file at the project root:
DATA_PATH=data/raw
MODEL_PATH=models
RANDOM_STATE=42Le projet utilise plusieurs métriques pour évaluer les modèles :
- Accuracy β Overall correctness of predictions
- Precision β Proportion of correct positive predictions
- Recall - Ability to detect positive cases
- F1-Score - Harmonic mean of Precision and Recall
- AUC-ROC - Area under the ROC curve
- Specificity - Ability to detect negative cases
Contributions are welcome! To contribute to this project:
- Fork the repository
- Create a new branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Malek Harbaoui - Main Developer
β If this project helped you, please consider giving it a star!