[Miguel Estepa Polonio]
[Data Analytics, Barcelona, 2020]
- Project Description
- Hypotheses / Questions
- Dataset
- Cleaning
- Analysis
- Model Training and Evaluation
- Conclusion
- Future Work
- Workflow
- Organization
- Links
Heart diseases are the first cause of worldwide mortality. Could we predict them? Thanks to medical researches, data analysis, Machine Learning and Data Visualization we can offer an answer to this problem.
- Could we predict heart diseases?
- How would affect different people (according to different ages, gender and habits)?
2 datasets will be used in this project:
- A Cardiovascular disease dataset from Kaggle (https://www.kaggle.com/sulianova/cardiovascular-disease-dataset).
- A model of the Spanish population created by myself with data from INE (https://www.ine.es/)
Both dataset were clean in terms of strange numbers or NaNs. Also, during the process, in the cardiovascular dataset, some columns were deleted to study just the more demographic features.
- Watch the Presentation for all the different analysis.
- Several experiments with different models were made. At the end the selected option was KNN.
- The model has the best parameters according to an hyperparameters tunning and the evaluation of different random seeds of populations.
- 40 years old is the critical age. At that age, the probability of suffering a heart disease increases dramatically.
- Bad Habits Kill. Even when women have more natural probabilities, due to bad habits men die more.
- The percentage of smoking and drinking alcohol in woman is increasing in younger generations.
This project does not ends here. It is just a beginning. The next goal in the process is to study of the regions in Spain, to analyze the differences in the heart diseases probabilities and report to the institutions where changes should be done.
The organization was structured using trello. A strong focus in the brainstorming part and the design of the project was done before anything.