Improve Representation for Imbalanced Regression through Geometric Constraints (CVPR 2025)
Zijian Dong1*, Yilei Wu1*, Chongyao Chen2*, Yingtian Zou1, Yichi Zhang1, Juan Helen Zhou1
1National University of Singapore, 2Duke University, *Equal contribution
Our paper addresses representation learning for imbalanced regression by introducing two geometric constraints: enveloping loss, which encourages representations to uniformly occupy a hypersphere's surface, and homogeneity loss, which ensures evenly spaced representations along a continuous trace. Unlike classification-based methods that cluster features into distinct groups, our approach preserves the continuous and ordered nature essential for regression tasks. We integrate these constraints into a Surrogate-driven Representation Learning (SRL) framework. Experiments on several datasets demonstrate significant performance improvements, especially in regions with limited data.
An example dataset is provided as follows.
We provide our model weights trained on DIR benchmark datasets:
- STS-B-DIR (sentence similarity regression)
- IMDB-WIKI-DIR (age estimation)
- AgeDB-DIR (age estimation)
The repository is organized as follows:
imbalanced-regression/
├── sts-b-dir/ # STS-B dataset for semantic textual similarity regression
│ ├── preprocess.py # Preprocessing and data preparation for STS-B
│ ├── dfr.py # Method implementation
│ ├── evaluate.py # Evaluation scripts for model performance
│ ├── models.py # Model architectures for the regression tasks
│ ├── tasks.py # Task-specific configurations and operations
│ ├── trainer.py # Training and evaluation pipelines
│ ├── train.py # Script to initiate the training process
│ └── glue_data/ # Directory containing raw and preprocessed STS-B
├── imdb-wiki-dir/ # IMDB-WIKI dataset for age estimation
│ ├── dataset.py # Preprocessing and data preparation for IMDB-WIKI
│ ├── data # dataset directory
│ ├── dfr.py # Method implementation
│ ├── resnet.py # Network implementation
│ ├── evaluate.py # Evaluation pipelines
│ └── utils.py/ # Directory containing utility functions
├── agedb-dir/ # AgeDB dataset for age estimation
│ ├── evaluate.py # Evaluation pipelines
│ ├── data # dataset directory
│ ├── dfr.py # Method implementation
│ ├── resnet.py # Network implementation
│ ├── dataset.py # Preprocessing and data preparation for IMDB-WIKI
│ └── utils.py/ # Directory containing utility functions
- Download GloVe word embeddings (840B tokens, 300D vectors) using
python glove/download_glove.py- We use the standard file (
./glue_data/STS-B) provided by DIR, which is used to set up balanced STS-B-DIR dataset. To reproduce the results in the paper, please directly use this file. If you want to try different balanced splits, you can delete the folder./glue_data/STS-Band run
python glue_data/create_sts.py- The required dependencies for this task are quite different to other three tasks, so it's better to create a new environment for this task. If you use conda, you can create the environment and install dependencies using the following commands:
conda create -n sts python=3.6
conda activate sts
# PyTorch 0.4 (required) + Cuda 9.2
conda install pytorch=0.4.1 cuda92 -c pytorch
# other dependencies
pip install -r requirements.txt
# The current latest "overrides" dependency installed along with allennlp 0.5.0 will now raise error.
# We need to downgrade "overrides" version to 3.1.0
pip install overrides==3.1.0- training
python train.py --dfr --w1 1e-4 --w2 1e-2 --w3 1e-4 --temp 0.1python evaluate.py --evaluate --resume <path_to_evaluation_ckpt> #agedb-dir & imdb-wiki-dirpython evaluate.py --evaluate --eval_model <path_to_evaluation_ckpt> #sts-b-dirOur codebase was built on DIR and RankSim. Thanks for their wonderful work!
If you find this repository useful in your research, please consider giving a star ⭐️ and a citation:
@inproceedings{dong2025improve,
title={Improve Representation for Imbalanced Regression through Geometric Constraints},
author={Dong, Zijian and Wu, Yilei and Chen, Chongyao and Zou, Yingtian and Zhang, Yichi and Zhou, Juan Helen},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
