First, download the images and labels for detection from dataset for detection and decompressed it. It is structured according to yolov5's required format.
cd datasets
tar -zvxf yolo_detection.tar.gzyolov5 assumes that the directory is sturctured as follows:
- datasets
- yolo_detection
- images
- train
- <name>.jpg
- ...
- val
- labels
- train
- <name>.txt
- ...
- val
- yolov5
Second, change the datasets/chimpanzee.yaml file according to your needs. You can change the path as it is the root dirctory of your dataset. It can be a path relative to the yaml file or a absolute path.
NOTE: yolov5 will substitute images with labels automatically to find the corresponding label path. Thus, for your convenience, you'd better not change the directory name.
path: ../yolo_detection
train: images/train
val: images/val
test: # optional
names:
0: chimpanzeeYou can train a model from scratch:
python yolov5/train.py --data datasets/chimpanzee.yaml --weights '' --cfg yolov5s.yaml --img 640Or you can use the pre-trained weights and finetune the whole network:
python yolov5/train.py --data datasets/chimpanzee.yaml --weights yolov5s.pt --img 640For more details, please refer to the yolov5 GitHub page.
First, download the dataset from cropped images for identification and decompress it. Here, 'crop' implies that the images used for identification are groung-truth bounding boxes.
cd datasets
tar -zvxf crop_identification.tar.gzAfter decompression, the directory looks like this:
- datasets
- yolo_detection
- crop_identification
- train
- Azibo
- 0.jpg
- ...
- Bambari
- ...
- Tai
- val
Large Margin Cosine Loss was first proposed by CosFace: Large Margin Cosine Loss for Deep Face Recognition. It aims to learn discriminative features by maximizing inter-class cosine margin. Our work first adopt this loss function in chimpanzee recognition and get relatively good results.
During training, the weight feature
During testing, the class of largest cosine similarity is the predicted label.
There are two hyperparameters in this loss: the forced margin
During training and testing, the model is sturctured as follows:
- encoder: resnet50 -> feat_dim = 2048
- weight: W -> num_classes = 17
- normalized Linear (2048, 17)
The model is trained end-to-end and validated on-the-fly.
First, change the file identification/configs/train_lmcl.yaml according to your needs.
For example, load_pt_encoder indicates whether to use the pre-trained weights of the encoder provided by PyTorch.
# Part of the yaml file as an example
model: lmcl
model_args:
encoder: resnet50
load_pt_encoder: True # whether to load the pre-trained weights from PyTorch
num_classes: 17Then, train the model using the configuration in the yaml file.
python identification/train.py --config identification/configs/train_lmcl.yaml
NOTE: you should be careful with your current working directory and the path set in the yaml file.
You will get the output every print_freq epochs like this:
epoch 20, train time 2.26, train_loss 11.95, train_acc 31.12; val_loss 10.80, val_acc 46.40
epoch 40, train time 2.04, train_loss 10.69, train_acc 38.37; val_loss 11.21, val_acc 43.53
epoch 60, train time 2.02, train_loss 9.71, train_acc 44.11; val_loss 10.35, val_acc 55.76
epoch 80, train time 2.23, train_loss 9.21, train_acc 49.40; val_loss 9.97, val_acc 56.47
epoch 100, train time 2.10, train_loss 9.34, train_acc 49.85; val_loss 11.35, val_acc 55.76
...
Contrastive learning is commonly used for self-supervised learning. This work proposes a new contrastive loss by leveraging label information.
In this setup, we use a two-stage pipeline.
During training:
- encoder: resnet50 -> 2048
- projection head: mlp/linear -> 128
- loss: supcon
After training, we can use the pre-trained encoder and finetune a linear classifier on top of it.
During finetuning:
- (frozen) encoder: resnet50 -> 2048
- classifier: Linear -> 17
-
Self-supervised contrastive learning
You need to specify the
modelassupconandmethodassimclr.python identification/train.py --config identification/configs/train_simclr.yaml -
Supervised contrastive learning
You need to specify both the
modelandmethodassupcon.python identification/train.py --config identification/configs/train_supcon.yaml
During finetuning, we will add a classifier on top of the encoder used in the training time and only train the classifier.
If you follow the previous steps, you should be able to find contrastively trained models under identification/save/supcon_models/<xxxx>/, where <xxxx> means that it depends on your training settings.
Then the most important thing is to specify the path where you stored the trained SupCon models. You should at least change the load part in file identification/configs/finetune_supcon.yaml or identification/configs/finetune_simclr.yaml.
For example:
# Part of the yaml file
dataset: path # do not change this if you want to use a self-defined dataset
# data_folder: crop_identification
data_folder: datasets/crop_identification # root directory of your dataset
image_size: 32
batch_size: 256
num_workers: 8
model: supcon
model_args:
encoder: resnet50
load_pt_encoder: False # whether load the pre-trained weights from PyTorch
head: mlp
feat_dim: 128
# the path to load the trained SupCon model during the training time
load: identification/save/supcon_models/model_supcon_load_pt_encoder_True_optimizer_adam_bs_256_scheduler_exp_method_supcon/ckpt_epoch_100.pthAfter specifying the model that you want to finetune on:
- Finetune the model trained from self-supervised contrastive learning
python identification/finetune.py --config identification/configs/finetune_simclr.yaml
- Finetune the model trained from supervised contrastive learning
python identification/finetune.py --config identification/configs/finetune_supcon.yaml