Diabetes Multiclassification Documentation

Diabetes multiclassification using different machine learning algorithms such as Logistic Regression, Decision Trees, Random Forest and LightGBM

Last Updated: January 4th, 2025

Installation

Make sure you have python downloaded if you haven't already. Follow these steps to set up the environment and run the application:

Clone the Repository:

git clone https://github.com/Sambonic/diabetes-multiclassification

cd diabetes-multiclassification

Create a Python Virtual Environment:

python -m venv env

Activate the Virtual Environment:

On Windows:
```
env\Scripts\activate
```
On macOS and Linux:
```
source env/bin/activate
```

Ensure Pip is Up-to-Date:

python.exe -m pip install --upgrade pip

Install Dependencies:
```
pip install -r requirements.txt
```
Import Diabetes Multiclassification as shown below.

Usage

To utilize this diabetes classification project:

Run the notebook: Execute the Jupyter Notebook (diabetes_classification_ml.ipynb). The notebook will perform the following actions automatically:
- Load the diabetes dataset.
- Perform exploratory data analysis (EDA), including data type checking, statistical analysis, visualization of missing values and class distributions, and correlation analysis.
- Handle missing values using mode imputation for categorical features and median imputation for numerical features. Evaluate different imputation methods.
- Handle outliers in the 'BMI' feature using IQR.
- Discretize the 'BMI' feature into meaningful categories.
- Balance the dataset using undersampling and oversampling techniques (SMOTENC).
- Perform feature selection using Chi-squared test and Random Forest feature importance.
- Train several classification models (Random Forest, Decision Tree, LightGBM, Logistic Regression) with and without feature selection and hyperparameter tuning.
- Evaluate model performance using various metrics (accuracy, precision, recall, F1-score) and visualize results using learning curves, confusion matrices, and ROC curves.
- Compare different model performance.
Interpret results: The notebook will generate various visualizations and metrics that show the performance of different models under different conditions (with/without feature selection, with/without hyperparameter tuning). Based on the results, one can determine which model performs best for diabetes classification.

Features

Diabetes Multi-classification: Predicts diabetes severity (no diabetes, pre-diabetes, diabetes) using machine learning.
Data Preprocessing: Handles missing values using mean/mode imputation and outlier adjustments, and explores different imputation strategies (KNN, mean/median/mode).
Data Balancing: Addresses class imbalance using undersampling of the majority class and oversampling of minority classes with SMOTENC (handling categorical features).
Feature Selection: Employs Chi-squared test and Random Forest feature importance to select relevant features.
Model Training: Trains and evaluates multiple classification models: Random Forest, Decision Tree, LightGBM, and Logistic Regression.
Model Evaluation: Uses various metrics (accuracy, precision, recall, F1-score) and visualizes results with confusion matrices and ROC curves.
Hyperparameter Tuning: Optimizes model hyperparameters using RandomizedSearchCV.
Learning Curve Analysis: Plots learning curves to assess model bias and variance.
Comparative Analysis: Compares model performance with and without feature selection and hyperparameter tuning across multiple algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
config		config
datasets		datasets
docs		docs
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diabetes Multiclassification Documentation

Last Updated: January 4th, 2025

Table of Contents

Installation

Usage

Features

About

Uh oh!

Releases

Packages

Languages

License

Sambonic/diabetes-multiclassification

Folders and files

Latest commit

History

Repository files navigation

Diabetes Multiclassification Documentation

Last Updated: January 4th, 2025

Table of Contents

Installation

Usage

Features

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages