Skip to content

Latest commit

 

History

History
39 lines (20 loc) · 1.4 KB

File metadata and controls

39 lines (20 loc) · 1.4 KB

Ironhack logo

Final project

Using Machine Learning to predict cervix cancer risk

Rationale

ML is on its way to become a powerful dyagnostic tool. Unitl then, it can be used to determine where to focus screening efforts. In this project I try to predict the risk of having a positive cervical cancer biopsy using survey infromation regarding potential risk factors. The dataset comprises demographic information, habits, and historic medical records of 858 Venezuelan patients.

Content

There are two directories Code and Data.

Code:

Four Jupyter Notebooks:

cervix_project_1. Data_preparation. Data splitting, cleaning and LogisticRegression

cervix_project_2.Model_design_1. Testing a neural network model to predict the results of a biopsy

cervix_project_2.Model_design_2. Set initial bias to train the model

cervix_project_2.Model_design_3. Using SMOTE to balance dataset

Data:

kag_risk_factors_cervical_cancer.csv. csv file downloaded from Kaggle (https://www.kaggle.com/loveall/cervical-cancer-risk-classification). The dataset is originally from Venezuela (https://archive.ics.uci.edu/ml/datasets/Cervical+cancer+%28Risk+Factors%29)

Cervix_cancer_data_profiling_report.html. Train Data analysis report

Cervix_cancer_data_profiling_report_post_processing.html. Train Data analysis report after data cleaning

Intermedite output files (csv files)