Obesity Level Estimation Using Machine Learning

Statistical Learning - Master's in Computer Engineering (Data Science & Data Engineering Pathway)

Overview

This project aims to estimate obesity levels in individuals from Mexico, Peru, and Colombia based on their eating habits and physical condition. Using machine learning classification models, patterns in the dataset are analyzed to predict obesity levels with high accuracy.

Dataset Information

Source: UCI Machine Learning Repository
Instances: 2111
Features: 16
Target Variable: NObesity (Obesity Level)
Classes:
- Insufficient Weight
- Normal Weight
- Overweight Level I
- Overweight Level II
- Obesity Type I
- Obesity Type II
- Obesity Type III
Data Generation:
- 77% Synthetic (via Weka & SMOTE)
- 23% Collected directly from users via a web platform

Models Applied & Results

Several classification models were implemented to evaluate predictive performance, leveraging GridSearchCV for hyperparameter tuning and 5-fold cross-validation to ensure generalization.

Performance Summary

Model	Best Hyperparameters	CV Accuracy (%)	Test Accuracy (%)	Zero-One Loss	F1 Score (%)
Logistic Regression	ElasticNet (L1/L2)	87%	87.36%	66.0	87.36%
Decision Tree	Depth = 10, Entropy	91%	93.68%	33.0	93.68%
Random Forest	400 Estimators, Entropy	91%	91.95%	42.0	91.95%
SVM	Linear Kernel, C=5	95%	96.74%	17.0	96.74%
AdaBoost	Learning Rate = 0.6, 300 Estimators	92%	88.12%	62.0	88.12%

Key Observations

SVM achieved the highest test accuracy (96.74%), indicating superior generalization.
Decision Tree performed well (93.68%), but showed minor overfitting in training.
Random Forest provided stable predictions (91.95%), with lower variance compared to Decision Tree.
Logistic Regression, serving as the baseline, performed moderately (87.36%).
AdaBoost, despite strong cross-validation accuracy (92%), exhibited lower test accuracy (88.12%), suggesting generalization challenges.

Methodology

1- Data Preprocessing → Standardization, PCA for dimensionality reduction, SMOTE for class balancing
2- Model Training → Decision Tree, Random Forest, AdaBoost, SVM, Logistic Regression
3- Hyperparameter Optimization → GridSearchCV for best parameter selection
4- Cross-Validation → 5-Fold CV to ensure robustness
5- Model Evaluation → Accuracy, F1 Score, Confusion Matrix, Precision, Recall

Installation & Dependencies

To replicate the project, install the required dependencies:

pip install -r requirements.txt

Dependencies:

numpy  
pandas  
matplotlib  
plotly  
seaborn  
scikit-learn  
imbalanced-learn  
xgboost

Running the Project

1- Clone the repository

2- Install dependencies

pip install -r requirements.txt

3- Run the Jupyter Notebook

jupyter notebook

4- Execute the analysis

Conclusion

This project demonstrates the effectiveness of machine learning in predicting obesity levels based on dietary and physical behavior data. The ensemble models (AdaBoost & SVM) outperform traditional methods, highlighting the impact of boosting techniques in classification problems.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
doc		doc
ObesityDataSet_raw_and_data_sinthetic.csv		ObesityDataSet_raw_and_data_sinthetic.csv
Obesity_MachineLearning.ipynb		Obesity_MachineLearning.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Obesity Level Estimation Using Machine Learning

Statistical Learning - Master's in Computer Engineering (Data Science & Data Engineering Pathway)

Overview

Dataset Information

Models Applied & Results

Performance Summary

Key Observations

Methodology

Installation & Dependencies

Running the Project

Conclusion

About

Uh oh!

Releases

Packages

Languages

arashabe/ML_Obesity_Estimation

Folders and files

Latest commit

History

Repository files navigation

Obesity Level Estimation Using Machine Learning

Statistical Learning - Master's in Computer Engineering (Data Science & Data Engineering Pathway)

Overview

Dataset Information

Models Applied & Results

Performance Summary

Key Observations

Methodology

Installation & Dependencies

Running the Project

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages