This project focuses on predicting obesity levels based on lifestyle, eating habits, and physical condition using machine learning.
The goal of this project is to analyze factors influencing obesity and build a predictive model for multi-class classification.
The workflow includes:
- Data preprocessing and feature engineering
- Exploratory Data Analysis (EDA)
- Model training and evaluation
- Compared multiple machine learning models for multi-class classification
- Implemented both classical ML models and neural networks
- Applied feature engineering and encoding techniques
- Identified the best performing model (XGBoost)
The dataset includes demographic data, eating habits, and lifestyle features used to predict obesity levels.
- Logistic Regression
- Decision Trees
- Random Forest
- Support Vector Machines (SVM)
- XGBoost
- Neural Networks (RSNNS, nnet)
XGBoost achieved the best overall performance:
- Accuracy (test): 96.9%
- Cross-validation accuracy: 97.0%
- AUC: ~0.998
- Random Forest: 95.95% accuracy, AUC: 0.999
- Decision Tree: 92.38% accuracy
- SVM: 81.67% accuracy (struggled with complex class boundaries)
- nnet: 95% accuracy, Kappa: 0.9416
- RSNNS: 94.29% accuracy, Kappa: 0.9333
Models struggled most with adjacent classes (Overweight Level I vs II), while obesity classes were classified with high accuracy.
git clone https://github.com/alaszmigiel/Obesity-Levels-Prediction.git
cd Obesity-Levels-Predictionjupyter notebookObesity_Levels_Prediction.ipynbThe notebook includes all required steps and installations if needed.
- Language: R
- Data Processing: dplyr, tidyr
- Visualization: ggplot2, corrplot
- ML: caret, xgboost, kernlab
- Neural Networks: RSNNS, nnet
- Evaluation: pROC, MLmetrics