Skip to content

MaSOouD69/Capstone-Repo-Bootcamp-Project

Repository files navigation

🧠 Stroke Prediction & Comprehensive Analysis

🎯 Project Synopsis

Strokes cast a long shadow over global health, with a staggering 80% deemed preventable. Our analytical journey navigates through critical stroke indicators, spanning various age groups, genders, lifestyles, and health conditions. We aim to create vivid, intuitive visualizations to convey essential data, elucidating the intricate link between lifestyle choices and stroke incidence. Our ultimate objective is to leverage finely calibrated predictive models to estimate the likelihood of strokes, thereby championing preventive healthcare initiatives.

📊 Investigative Focus

Our exploration orbits around pivotal queries and suppositions:

Age Dynamics: Investigating the role of age in stroke prevalence across different demographic segments. BMI & Glucose Interplay: Examining the association between higher BMI, elevated glucose levels, and increased stroke risk. Smoking & Stroke Correlation: Probing the potential relationship between smoking habits and a heightened risk of stroke. Cardiac Conditions & Stroke Incidence: Evaluating the link between pre-existing heart diseases and the occurrence of strokes. Occupational Stress & Hypertension: Analyzing whether intense work pressure and subsequent high blood pressure contribute to the risk of stroke. Gender-Specific Factors: Assessing if males face a greater stroke risk due to occupational stress, or if the risks are gender-neutral.

🚀 Project Trajectory

🧹 Data Sanitization

We commenced by cleansing the dataset: excising irrelevant features, addressing missing values, and discarding any predictors not present initially.

🕵️‍♂️ Exploratory Data Analysis (EDA)

Through EDA, we aimed to understand the distribution of features and their relationships with the stroke rate, which informed our choice of model and directed our feature engineering efforts.

🧠 Feature Engineering

We meticulously transformed the data, drawing on medical insights, converting raw information into a machine-learning digestible format, and enhancing categorical variables.

📊 Modeling Strategy

Our modeling odyssey embraced a diverse array of techniques:

Logistic Regression (Statsmodels & sklearn) Support Vector Machine Decision Tree Random Forest PCA with Logistic Regression KNN XGBoost Neural Network Logistic Regression (Statsmodels) took the lead, outshining its counterparts with its robust performance.

📝 Insights & Future Endeavors

Our leading model boasts a commendable accuracy of approximately 80%, proficiently identifying 60% of actual stroke cases, markedly improving risk prediction.

Potential enhancements include:

  1. Tackling Class Imbalance: Confronting the dataset's disproportionate representation of stroke occurrences. Methods like random undersampling may provide a solution.
  2. Enriching Features: Incorporating additional, currently absent, predictors could significantly sharpen the model's predictive prowess.
  3. Exploring Model Diversity: While the Decision Tree demonstrated prowess in this project, other methodologies, like neural networks or ensemble approaches, might reveal enhanced outcomes.
  4. Balancing Precision with Understandability: Future iterations should strive to balance predictive performance with ease of interpretation, aligning with end-user requirements.

This project highlights the practical application of data science in stroke prediction, offering invaluable insights for preemptive healthcare strategies. Our future agenda aims to further refine the model's predictive accuracy and interpretability, steadfast in our commitment to advancing preventive health measures.

About

capstone Brainstation project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published