Skip to content

lzyhha/HSSL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)

The official codebase for Enhancing Representations through Heterogeneous Self-Supervised Learning.

Introduction

HSSL framework

Incorporating heterogeneous representations from different architectures has facilitated various vision tasks, e.g., some hybrid networks combine transformers and convolutions. However, complementarity between such heterogeneous architectures has not been well exploited in self-supervised learning. Thus, we propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. In this process, HSSL endows the base model with new characteristics in a representation learning way without structural changes. To comprehensively understand the HSSL, we conduct experiments on various heterogeneous pairs containing a base model and an auxiliary head. We discover that the representation quality of the base model moves up as their architecture discrepancy grows. This observation motivates us to propose a search strategy that quickly determines the most suitable auxiliary head for a specific base model to learn and several simple but effective methods to enlarge the model discrepancy. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks, including image classification, semantic segmentation, instance segmentation, and object detection.

Installation

Please install PyTorch and download the ImageNet dataset.

Training and Pre-trained Models

Architecture Parameters Pre-training Epochs Fine-tuning Epochs Top-1 download script
ViT-B/16 85M 100 100 83.8% huggingface | baidu script
ViT-B/16 85M 150 100 84.1% huggingface | baidu script

Evaluation

We fully fine-tune the pre-trained models on ImageNet-1K by using the codebase of MAE.

For downstream tasks, e.g., semantic segmentation, please refer to iBOT.

Addentionally, we also use ImageNetSegModel to implement semi-supevised semantic segmentation on ImageNet-S dataset.

Citation

If you find this repository useful, please consider giving a star and a citation:

@article{li2025hssl,
  title={Enhancing Representations through Heterogeneous Self-Supervised Learning}, 
  author={Li, Zhong-Yu and Yin, Bo-Wen and Liu, Yongxiang and Liu, Li and Cheng, Ming-Ming},
  journal=TPAMI,
  year={2025}
}

License

The code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Noncommercial use only. Any commercial use should get formal permission first.

Acknowledgement

This repository is built using the DINO repository, the iBOT repository, and the MAE repository.

About

Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published