This project involves building a Credit Scoring Model that predicts the creditworthiness of individuals based on historical financial data. The goal is to create a model that can predict whether a person is likely to default on a loan or not. It is built using machine learning techniques, specifically classification algorithms, to make accurate predictions based on various input features such as age, income, loan amount, and more.
The Credit Scoring Model predicts whether a loan will be defaulted (1) or fully paid (0) based on an individual's profile. The model was trained using a dataset containing information about loan applicants and their financial status, including:
- Age, income, home ownership
- Loan amount, interest rate, grade
- Loan repayment status (target variable)
- Predict the probability of a loan default based on applicant details.
- Includes a Streamlit UI to allow users to input new applicant data and predict their creditworthiness.
- Handle missing values and categorical features effectively.
The dataset used in this project contains 32,581 rows and 12 columns. Here's an overview of the columns:
person_age: Age of the person (integer).person_income: Annual income of the person (integer).person_home_ownership: Home ownership status (categorical:RENT,OWN,MORTGAGE, etc.).person_emp_length: Length of employment (float, with some missing values).loan_intent: Purpose of the loan (categorical:PERSONAL,EDUCATION,MEDICAL, etc.).loan_grade: Grade assigned to the loan (categorical).loan_amnt: Loan amount (integer).loan_int_rate: Interest rate of the loan (float, with some missing values).loan_status: Target variable representing loan repayment status (1= Defaulted,0= Fully paid).loan_percent_income: Loan amount as a percentage of income (float).cb_person_default_on_file: Whether the person has defaulted before (categorical:Y,N).cb_person_cred_hist_length: Length of credit history (integer).
- Missing values are present in
person_emp_lengthandloan_int_rate. - The target variable is
loan_status, indicating whether a loan was defaulted (1) or fully paid (0).
-
Data Preprocessing:
- Handle missing values and scale numerical features.
- Encode categorical features using label encoding.
-
Model Training:
- Train a classification model (e.g., Logistic Regression, Random Forest, etc.) on the preprocessed dataset.
- Evaluate the model using accuracy, confusion matrix, and other performance metrics.
-
Saving the Model:
- The trained model is saved using
joblibfor future predictions.
- The trained model is saved using
-
Streamlit UI:
- A Streamlit app was developed to allow users to input applicant details and predict creditworthiness in real-time.
git clone https://github.com/yourusername/credit-scoring-model.git
cd credit-scoring-modelEnsure that you have Python 3.x installed. You can install the required dependencies using pip:
pip install -r requirements.txtTo test the credit scoring model with new data, run the following command:
streamlit run app.pyThis will launch a Streamlit web application where you can input details for an applicant and predict whether they will default on the loan.
The Streamlit app will provide:
- Creditworthiness Prediction: Whether the applicant is considered Low Risk or High Risk.
- Probability of Default: The prediction confidence, representing the likelihood of loan default.
The model uses classification techniques to predict the likelihood of defaulting on a loan. The following algorithms were considered:
- Logistic Regression
- Random Forest
- Gradient Boosting
The model was evaluated on various performance metrics such as:
- Accuracy
- Precision
- Recall
- F1-Score
credit_scoring_model.pkl: The saved model after training.app.py: The Streamlit UI script.credit_risk_dataset.csv: The dataset used for model training.requirements.txt: A file containing a list of dependencies for the project.
This project is licensed under the MIT License - see the LICENSE file for details.