- Title: Big Self-Supervised models are Strong Semi-Supervised Learners
- Publication: NeurIPS, 2020
- Link: [paper] [code]
- for semi-supervised learning via the task-agnostic use of unlabeled data
- the fewer the labels, the more benefit from a bigger model
- with the task-specific use of unlabeled data, the predictive performance improve and transfer into a smaller network
- deeper projection head
- improve semi-supervised performance when fine-tuning from a middle layer of the projection head
- unlabeled data is used in a task-agnostic way
- for general representation via unsupervised pretraining
- general representations are adapted for a specific task via supervised fine-tuning
- unlabeled data is used in a task-specific way
- for improving predictive performance & obtaining a compact model
- train Student networks on the unlabeled data with imputed labels from the fine-tuned Teacher network Summarize : pretrain → fine-tune → distill
- increase width & depth, using SK → improve performance
- bigger models are more label-efficient
- gains → larger for semi-supervised learning
- deeper projection head during pretraining is better
- fine-tuning from the first layer is better than fine-tuning from the input (0th layer)
- bigger ResNets, improvements from having a deeper projection head are smaller
- Student model has Smaller, Same architecture with Teacher model → distillation improve model efficiency
@article{DBLP:journals/corr/abs-2006-10029,
author = {Ting Chen and
Simon Kornblith and
Mohammad Norouzi and
Geoffrey E. Hinton},
title = {Big Self-Supervised models are Strong Semi-Supervised Learners},
journal = {CoRR},
volume = {abs/2006.10029},
year = {2020},
url = {https://arxiv.org/abs/2006.10029v2},
eprinttype = {arXiv},
eprint = {2006.10029v2},
timestamp = {Mon, 26 Oct 2020 03:09:28 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2006-10029.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
