GitHub - yuawano/fetal_health_classification: This is the final project from Ironhack bootcamp with the topic of fetal health using multiclass logistic regression.

Project summary

This is a multiclass classification project using Cardiotocograph (CTG) data to classify the fetal's wellbeing (Normal, Suspect, Pathological) in order to help medical practitioners to detect the fetal's in risk group at early stage. The motivation behind this project is to end preventable death of stillbirth, neonatal and maternal death.

The main target in this project are

Create models to identify the fetal's wellbeing
Detect fetal's wellbeing in the risk group ('Suspect' or 'Pathological') by data visualization in Tableau.

For further information about this project and the results, please refer to the presentation or below.

Background

Did you know that having a healthy child is rather a miracle than 'normal'? Did you know that many pregnancies come with risk of death? Many of us who lives in a developed world may have a beautiful picture of having a child but in the developing world there are many challenges lie in pregnancy and labor. Over 90% of fetal death and maternal death are happening in the developing world.

Based on the report from UNICEF open data source, it is estimated that 5.1 million infant passed away before their fifth birthday. Out of the 5.1 million, approx. 50% of death is happening within 4 weeks after birth (neonatal). And in addition, there are estimates of 2 million stillbirth (born dead) every year which means that it occurs every 16 seconds. What is even more tragic is that the majority could have been prevented through quality care during pregnancy and at birth.

The world made remarkable progress in child survival in the past three decades. Since 2015, there is a new targets set out in the Sustainable Development Goals (SDGs) of ending preventable death of newborns and children under 5 years old. Therefore it is essential to have a closer look into how to end preventable stillbirth and neonatal.

The mother's also comes with risk during pregnancy. In 2015, in anticipation of the launch of the SDGs, the World Health Organization (WHO) and partners released a consensus statement and full strategy paper on ending preventable maternal mortality. As of 2019 report from UNICEF open data source, there are approx. 300 thousand maternal death worldwide and most are preventable.

So our key questions is How can we end preventable stillbirth, neonatal and maternal mortality?.

Project goal and solution

Cardiotocograph(CTG) is the most widely used techniques in developed countries to monitor fetal heart rate and uterine contractions. The information of the CTG helps medical practitioners to evaluate the fetal's wellbeing (healthy or pathological) to prevent child and maternal mortality. In this project, the target is to identify the fetal wellbeing from modeling and data visualization. Ultimately apply these results and techniques in the developing world in order to help medical practitioners to take immediate actions to prevent fetal death and maternal death and achieve the target of SDGs by 2030.

Data info

Data set: Cardiotocography (CTG) data of fetal heart and uterine contractions from Kaggle
Data set size: 2126 rows x 22 features
Target variable: Fetal health status (Normal, Suspect, Pathological)
Example of features: Baseline value (beats per minute), Uterine contractions per seconds, Severe decelerations, Histogram widths, max, min, mean

Results

model

Three different models were tested in this studies: Logistic regression, KNN Classifier and Random Forest. In this studies I am looking into high recall score to identify the best model since it is cruitial to correctly identify as many fetal in risk groups 'B' = suspect and 'C' = pathological. Recall measure is used when we want to answer the question 'what proportion of actual Positives is correctly classified and it is calculated True Positive/(True Positive + False Positive). As the metrics to fix the imbalanced data, Upsampling and smote was applied.

Results:

Models			upsample	smote
Logistic regression	accurary score		0.7	0.72
	recall score	A/ B/ C	0.72/0.61/0.71	0.67/0.69/0.68
KNN Classifier	accuracy score		0.81	0.83
	recall score	A/ B/ C	0.78/0.90/0.95	0.84/0.89/0.93
Random Forest	accuracy score		0.91	0.91
	recall score	A/ B/ C	0.93/0.80/0.89	0.93/0.81/0.89
Note: 'A' = Normal, 'B' = Suspect, 'C' = Pathological

Based on the results above, Random Forest has the highest accuracy score of 0.91 and best balanced scores in recall. Thus, upsampling and smote metrics with Random Forest performed best in this particular studies. In this studies F1 score is not used since F1 score generates it's harmonized score by putting more weight on the lower score (in this case precision). As it is cruitial to identify the correct fetal wellbeing of 'B' and 'C', recall is the suitable measurement in this case.

data visualization

Please visit Python file for data viz with python, and Tableau public_yuawano for Tableau.

Next steps

Where CTG is distributed

Provide the model and data visualization to hospitals, praxis so that the medical practitioners can apply the techniques in their daily check ups of their patients. This will help to identify fetal's wellbeing and take immediate actions if necessary.

The gap still lies on 'how can we apply this solution in countries where CTG is not distributed'. Below are the next steps for countries there CTG is not distributed:

Identify where CTG is not distributed
Identify where trained practitioners are in lack
Estimate the necessary investments and impact and allocate donations to maternal and fetal health

Fun-facts scene from this project :D

This is the very first draft of my storyline before the presentaion was made :-D

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
Python		Python
Tableau		Tableau
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project summary

Background

Project goal and solution

Data info

Results

model

data visualization

Next steps

Fun-facts scene from this project :D

Libraries

Other data resource:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project summary

Background

Project goal and solution

Data info

Results

model

data visualization

Next steps

Fun-facts scene from this project :D

Libraries

Other data resource:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages