This repository is an attempt to implement a self-supervised learning model for blood cell images using with 2 approches:
- Triplet Loss
- Contrastive Loss
The model is trained on the Blood Cell Images dataset from Kaggle. The dataset contains 12,500 images of blood cells, which are classified into 4 categories: eosinophil, lymphocyte, monocyte, and neutrophil.
However, since we want to train the model in a self-supervised manner, we will not use the labels provided in the dataset. Instead, we will use the triplet loss or contrastive loss to learn a feature representation of the images.
Where:
Ais the anchor imagePis the positive image (same class as anchor)Nis the negative image (different class from anchor)d(A, P)is the distance between the anchor and positive imagesd(A, N)is the distance between the anchor and negative imagesmarginis a hyperparameter that defines the minimum difference between the positive and negative distances
The goal of the model is to learn a feature representation of the images such that the distance between the anchor and positive images is minimized, while the distance between the anchor and negative images is maximized.
The negatives are found using batch hard negative mining, which selects the hardest negative for each anchor image. This done by finding the negative image that is closest to the anchor image in terms of distance in the batch.
I decided to use batch hard negative mining because it is more computationally efficient then mining negatives from the entire dataset.
It is important to note that since we are in self-supervised learning, we may get some false negatives.
Where:
z_iandz_jare the feature representations of the imagessim(z_i, z_j)is the cosine similarity between the feature representationsNis the number of images in the batchtis a temperature parameter that scales the similarity scores
The goal of the model is to learn a feature representation of the images such that the similarity between positive pairs is maximized, while the similarity between negative pairs is minimized.
The following packages are required to run the code provided in this repository:
- Python 3.10
- PyTorch
- NumPy
- Matplotlib
- Scikit-learn