Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)
The official codebase for Enhancing Representations through Heterogeneous Self-Supervised Learning.
Incorporating heterogeneous representations from different architectures has facilitated various vision tasks, e.g., some hybrid networks combine transformers and convolutions. However, complementarity between such heterogeneous architectures has not been well exploited in self-supervised learning. Thus, we propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. In this process, HSSL endows the base model with new characteristics in a representation learning way without structural changes. To comprehensively understand the HSSL, we conduct experiments on various heterogeneous pairs containing a base model and an auxiliary head. We discover that the representation quality of the base model moves up as their architecture discrepancy grows. This observation motivates us to propose a search strategy that quickly determines the most suitable auxiliary head for a specific base model to learn and several simple but effective methods to enlarge the model discrepancy. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks, including image classification, semantic segmentation, instance segmentation, and object detection.
Please install PyTorch and download the ImageNet dataset.
| Architecture | Parameters | Pre-training Epochs | Fine-tuning Epochs | Top-1 | download | script |
|---|---|---|---|---|---|---|
| ViT-B/16 | 85M | 100 | 100 | 83.8% | huggingface | baidu | script |
| ViT-B/16 | 85M | 150 | 100 | 84.1% | huggingface | baidu | script |
We fully fine-tune the pre-trained models on ImageNet-1K by using the codebase of MAE.
For downstream tasks, e.g., semantic segmentation, please refer to iBOT.
Addentionally, we also use ImageNetSegModel to implement semi-supevised semantic segmentation on ImageNet-S dataset.
If you find this repository useful, please consider giving a star and a citation:
@article{li2025hssl,
title={Enhancing Representations through Heterogeneous Self-Supervised Learning},
author={Li, Zhong-Yu and Yin, Bo-Wen and Liu, Yongxiang and Liu, Li and Cheng, Ming-Ming},
journal=TPAMI,
year={2025}
}
The code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Noncommercial use only. Any commercial use should get formal permission first.
This repository is built using the DINO repository, the iBOT repository, and the MAE repository.
