Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)

The official codebase for Enhancing Representations through Heterogeneous Self-Supervised Learning.

Introduction

Incorporating heterogeneous representations from different architectures has facilitated various vision tasks, e.g., some hybrid networks combine transformers and convolutions. However, complementarity between such heterogeneous architectures has not been well exploited in self-supervised learning. Thus, we propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. In this process, HSSL endows the base model with new characteristics in a representation learning way without structural changes. To comprehensively understand the HSSL, we conduct experiments on various heterogeneous pairs containing a base model and an auxiliary head. We discover that the representation quality of the base model moves up as their architecture discrepancy grows. This observation motivates us to propose a search strategy that quickly determines the most suitable auxiliary head for a specific base model to learn and several simple but effective methods to enlarge the model discrepancy. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks, including image classification, semantic segmentation, instance segmentation, and object detection.

Installation

Please install PyTorch and download the ImageNet dataset.

Training and Pre-trained Models

Architecture	Parameters	Pre-training Epochs	Fine-tuning Epochs	Top-1	download	script
ViT-B/16	85M	100	100	83.8%	huggingface \| baidu	script
ViT-B/16	85M	150	100	84.1%	huggingface \| baidu	script

Evaluation

We fully fine-tune the pre-trained models on ImageNet-1K by using the codebase of MAE.

For downstream tasks, e.g., semantic segmentation, please refer to iBOT.

Addentionally, we also use ImageNetSegModel to implement semi-supevised semantic segmentation on ImageNet-S dataset.

Citation

If you find this repository useful, please consider giving a star and a citation:

@article{li2025hssl,
  title={Enhancing Representations through Heterogeneous Self-Supervised Learning}, 
  author={Li, Zhong-Yu and Yin, Bo-Wen and Liu, Yongxiang and Liu, Li and Cheng, Ming-Ming},
  journal=TPAMI,
  year={2025}
}

License

The code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Noncommercial use only. Any commercial use should get formal permission first.

Acknowledgement

This repository is built using the DINO repository, the iBOT repository, and the MAE repository.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
models		models
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
framework.png		framework.png
loader.py		loader.py
main_hssl_pretrain.py		main_hssl_pretrain.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)

Introduction

Installation

Training and Pre-trained Models

Evaluation

Citation

License

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

lzyhha/HSSL

Folders and files

Latest commit

History

Repository files navigation

Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)

Introduction

Installation

Training and Pre-trained Models

Evaluation

Citation

License

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages