This tutorial has been given at Epidemium by Owkin. You can contact us at simon.jegou@owkin.com
In this tutorial we'll see what is the machine learning approach to solve any data science problem in 4 steps :
- Explore your data
- Define your objective and metrics
- Set a baseline
- Improve it !
We use a dataset of 5000 breast histology images and try to classify whether they present cancer or not. We begin with a k-nearest neighbors algorithm based on histograms and finish with a simple convolutional neural network !
For this tutorial you'll need to install jupyter and basic python packages. I recommend you to use anaconda. To train a neural network, you'll also need to install keras and tensorflow or theano.