The Credit Risk Prediction System is a machine learning application that evaluates a person's creditworthiness based on financial and personal details. It utilizes Apache Spark for scalable data processing and Streamlit for an interactive user interface.
This project consists of two main components:
-
Model Training (
credit_risk_model_trainer.py)- Trains a Random Forest classifier on the German Credit Risk Dataset.
- Performs categorical encoding, feature engineering, and data balancing to improve performance.
- Saves the trained model and feature indexers for future predictions.
-
Prediction Application (
credit_risk_detector.py)- A Streamlit-powered UI where users input financial details.
- Loads the trained model and predicts Good or Bad credit risk.
- Displays confidence levels for each prediction.
✔ End-to-End Machine Learning Pipeline
✔ Scalable Processing with Apache Spark
✔ Interactive Credit Risk Prediction via Streamlit UI
✔ Feature Engineering & Class Imbalance Handling
✔ Model Performance Evaluation (ROC-AUC Score)
Ensure you have the following installed:
- Python (>= 3.8)
- Apache Spark (>= 3.0)
- Streamlit (
pip install streamlit) - PySpark (
pip install pyspark)
git clone https://github.com/yourusername/credit-risk-prediction.git
cd credit-risk-predictionpip install -r requirements.txtThe model training script automatically downloads the dataset if it's missing. However, you can manually download it from:
German Credit Dataset - UCI
Run the following command to train the credit risk model:
python credit_risk_model_trainer.pyThis will:
- Download the dataset (if not present)
- Train a Random Forest Classifier
- Save the trained model in
./credit_risk_model/ - Store feature indexers in
./indexers/
Once the model is trained, launch the Streamlit app:
streamlit run credit_risk_detector.py- Open the Streamlit UI in your browser.
- Enter the applicant’s financial and demographic details.
- Click Predict Credit Risk to classify the applicant as Good ✅ or Bad ❌.
- View the prediction confidence score.
- Check Feature Descriptions for input details.
After training, the model evaluates performance using ROC-AUC Score:
Random Forest ROC-AUC Score: 0.85 (Example)
Additionally, feature importance is displayed to highlight key predictive variables.
📂 credit-risk-prediction
├── 📜 credit_risk_model_trainer.py # Model training script
├── 📜 credit_risk_detector.py # Streamlit UI for credit risk prediction
├── 📂 credit_risk_model/ # Trained Random Forest Model
├── 📂 indexers/ # Categorical feature encoders
├── 📄 requirements.txt # Python dependencies
├── 📄 README.md # Project documentation
We welcome contributions!
Feel free to fork this repo, create a pull request, or report issues.
This project is licensed under the MIT License.