Skip to content

In this final project, me and fellow INFO 1998 student Ethan Huang aim to build several machine learning models to determine the best predictive features of a diabetes risk-factors dataset from the CDC's Behavioral Risk Factor Surveillance System (BRFSS) survey (through Kaggle).

Notifications You must be signed in to change notification settings

rishis123/INFO-1998-Final-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 

Repository files navigation

INFO-1998-Final-Project

In this final project, me and fellow INFO 1998 student Ethan Huang aim to build a few machine learning models to determine the best predictive features of a diabetes risk-factors dataset from the CDC's Behavioral Risk Factor Surveillance System (BRFSS) survey (through Kaggle).

Link to dataset: https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset/

Models used -- Decision Tree, Balanced Decision Tree (file entitled "Selection Bias"), Perceptron, SVM, Logistic Regression.

Our best models for accuracy were Perceptron and Logistic Regression. The most significant risk factors across models seem to be HighBP and GenHlth. In other words, the risk factors that appear to be most of concern for diagnosing diabetes are Blood pressure and general health -- this is important for future ML models, preventive measures, and surveys/research.

Visualizations used -- Bar graphs, Pie Charts.

Please see the attached JupyterNotebook, entitled "INFO 1998 Final Project.ipynb" for pre-run results, visualizations and some notes/comments.

About

In this final project, me and fellow INFO 1998 student Ethan Huang aim to build several machine learning models to determine the best predictive features of a diabetes risk-factors dataset from the CDC's Behavioral Risk Factor Surveillance System (BRFSS) survey (through Kaggle).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •