GitHub - Mengqi-Lei/HyperCounter: Implementation of the paper "Hyper-Counter: Cross-Modal Hypergraph Fusion for RGB-T Crowd Counting".

Hyper-Counter: Cross-Modal Hypergraph Fusion for RGB-T Crowd Counting

Implementation of the paper "Hyper-Counter: Cross-Modal Hypergraph Fusion for RGB-T Crowd Counting".

Overview🔍

Figure 1. The framework of the proposed Hyper-Counter.

Abstract - In RGB-T crowd counting, the performance of existing methods is often constrained by two major challenges. On the one hand, the inherent spatial misalignment between RGB and thermal infrared images makes cross-modal feature alignment and fusion extremely difficult. On the other hand, effectively integrating the complementary semantic information from both modalities and capturing their complex high-order associations remains a significant challenge. To address these issues, we propose a novel framework based on cross-modal hypergraph fusion (Hyper-Counter), which for the first time introduces hypergraph learning into RGB-T cross-modal fusion. Specifically, we utilize parallel Transformer backbones to extract multi-scale features and employ convolutional blocks to preliminarily fuse the features from both modalities into joint features. Subsequently, we design a cross-modal hypergraph perception module (CroM-HP), where feature patches are treated as vertices, and a cross-modal hypergraph is constructed based on the joint and cross-modal features. Hypergraph neural networks are then used to achieve efficient semantic information interaction and fusion. Through CroM-HP, the model can associate corresponding regions across modalities in a spatially flexible manner and fully understand the intricate complementary semantic correlations. Moreover, a pyramid structure is adopted in the multi-level CroM-HPs to achieve cross-scale semantic transmission. Experiments on two widely used datasets, RGBT-CC and Drone-RGBT, demonstrate that our framework achieves the best performance, significantly outperforming previous methods. In particular, compared to the second-place method on the Drone-RGBT dataset, Hyper-Counter improves the GAME(0) and RMSE metrics by 7.58% and 9.71%, respectively.

Datasets📚

We evaluate the proposed method on two widely used RGB-T crowd counting datasets: 🔗RGBT-CC and 🔗Drone-RGBT.

The RGBT-CC dataset contains annotations for 138,389 pedestrians across 2,030 pairs of RGB-Thermal images. It features a variety of scenes, such as streets, shopping malls, amusement parks, and train stations, as well as diverse lighting conditions including daytime, nighttime, and indoor environments. Drone-RGBT is an aerial RGB-T crowd counting dataset captured by drones, comprising 3,607 image pairs with 175,698 annotated instances. This dataset also encompasses a wide range of scenes, such as campuses, parks, parking lots, and playgrounds, under various lighting conditions.

Experimental Results🏆

Table 1. Comparison with other state-of-the-art methods on the RGBT-CC dataset.

Method	Venue	Backbone	GAME(0)	GAME(1)	GAME(2)	GAME(3)	RMSE
MMCCN	ACCV'20	ResNet-50	13.82	17.83	22.20	29.64	24.36
BBS-Net	ECCV'20	ResNet-50	19.56	25.07	31.25	39.24	32.48
BL+IADM	CVPR'21	VGG-19	15.61	19.95	24.69	32.89	28.18
BL+MAT	ICME'22	VGG-19	13.61	18.08	22.79	31.35	24.48
DEFNet	ITS'22	VGG-16	11.90	16.08	20.19	27.27	21.09
MSDTrans	BMVC'22	PoolFormer	10.90	14.81	19.02	26.14	18.79
MC3Net	ITS'23	ConvNext	11.47	15.06	19.40	27.95	20.59
CGINet	EAAI'23	ConvNext	12.07	15.98	20.06	27.73	20.54
MJPNet-T	IoT'24	SegFormer	11.56	16.36	20.95	28.91	17.83
MCN	ESWA'24	PoolFormer	11.56	15.92	20.16	28.06	19.02
GETANet	GRSL'24	PVT	12.14	15.98	19.40	28.61	22.17
BGDFNet	TIM'24	VGG-16	11.00	15.04	19.86	29.72	19.05
VPMFNet	IoT'24	VGG-16	10.99	15.17	20.07	28.03	19.61
BMCC	ECCV'24	VGG-19 & ViT	10.19	13.61	17.65	23.64	17.32
SAM-Guided	ICMR'24	SAM	10.51	14.52	18.92	26.28	17.71
CSCA	PR'25	VGG-19	13.50	18.63	23.59	31.59	24.83
MISF-Net	TMM'25	VGG-16	10.90	14.87	19.65	29.18	19.42
CMFX	NN'25	VGG-19	11.25	15.33	19.62	26.14	19.38
Hyper-Counter	-	PVT	9.94	13.11	17.36	23.63	16.38

Table 2. Comparison with other state-of-the-art methods on the Drone-RGBT dataset.

Method	Venue	Backbone	GAME(0)	GAME(1)	GAME(2)	GAME(3)	RMSE
CSRNet	CVPR 2018	VGG-16	8.91	-	-	-	13.80
MMCCN	ACCV 2020	ResNet-50	20.63	12.76	14.00	15.73	18.22
BL+IADM	CVPR 2021	VGG-19	9.77	12.91	17.08	22.61	15.76
DEFNet	TITS 2022	VGG-16	10.87	-	-	-	17.93
MC3Net	TITS 2023	ConvNext	7.33	-	-	-	11.17
CGINet	EAAI 2023	ConvNext	8.37	9.97	12.34	15.51	13.45
BMCC	ECCV 2024	VGG-19 & ViT	6.20	-	-	-	10.40
GETANet	GRSL 2024	PVT	8.44	10.01	12.75	15.83	13.99
CSCA	PR 2025	VGG-19	9.51	12.12	15.84	21.57	15.19
CMFX	NN 2025	VGG-19	6.75	8.88	11.87	14.69	11.05
Hyper-Counter	-	PVT	5.73	7.25	9.41	12.63	9.39

Figure 2. Visualization of the model input, output, and intermediate features. a) The input RGB image and the ground truth count; b) The input thermal image; c) The feature map output from the RGB branch backbone; d) The feature map output from the thermal image branch backbone; e) The feature map after preliminary fusion, i.e., the joint feature; f) The feature map output from CroM-HP; g) The predicted density map and count result. It is clearly observed that the feature map output from CroM-HP, compared to the joint feature, remove most of the redundant features, highlighting the crowd characteristics and effectively eliminating non-human heat sources.

Getting Started🚀

1. Data Preparation

(1) RGBT-CC Dataset

The RGBT-CC dataset can be downloaded from 🔗here. We provide the preprocessing script preprocess_RGBTCC.py for the RGBT-CC dataset. After running the script, the dataset should be organised as follows:

RGBT_CC
├── train
│   ├── 1162_RGB.jpg
│   ├── 1162_T.jpg
│   ├── 1162_GT.npy
│   ├── 1164_RGB.jpg
│   ├── 1164_T.jpg
│   ├── 1164_GT.npy
│   ├── ...
├── val
│   ├── 1157_RGB.jpg
│   ├── 1157_T.jpg
│   ├── 1157_GT.npy
│   ├── ...
├── test
│   ├── 1197_RGB.jpg
│   ├── 1197_T.jpg
│   ├── 1197_GT.npy
│   ├── ...

(2) Drone-RGBT Dataset

The Drone-RGBT dataset can be downloaded from 🔗here. It should be organised as follows:

Drone_RGBT
├── train
│   ├── GT_
│   │   ├── 1R.xml
│   │   ├── 2R.xml
│   │   ├── ...
│   ├── RGB
│   │   ├── 1.jpg
│   │   ├── 2.jpg
│   │   ├── ...
│   ├── Infrared
│   │   ├── 1R.jpg
│   │   ├── 2R.jpg
│   │   ├── ...
├── test
│   ├── GT_
│   │   ├── 1R.xml
│   │   ├── 2R.xml
│   │   ├── ...
│   ├── RGB
│   │   ├── 1.jpg
│   │   ├── 2.jpg
│   │   ├── ...
│   ├── Infrared
│   │   ├── 1R.jpg
│   │   ├── 2R.jpg
│   │   ├── ...

2. Environment Setup

Detailed requirements are provided in the requirements.txt file. The core libraries are as follows:

torch: 1.11.0+cu113
torchvision: 0.12.0+cu113
timm: 1.0.12
mmcv-full: 1.7.2
mmdet: 3.3.0 (dev)
mmengine: 0.10.5
numpy: 1.22.4
scipy: 1.10.1
scikit-learn: 1.3.2

3. Training

The train.py script is used to train the model, where you can see all the hyperparameters and configurations. For convenience, we provide a bash script train.sh to train the model. You should set the --data-dir and --dataset to your dataset path and name. We recommend using the default hyperparameters for training. It is worth noting that we provide two methods to build the cross-modal hypergraph, based on k-Nearest Neighbor and epsilon-Ball Neighbor. In train.py, you can choose which method to use by --constr-hg, and set the k value or epsilon value by --constr-k or --constr-threshold. Epsilon-Ball Neighbor is the recommended method.

We use the PVT-v2-b3 from Pyramid Vision Transformer (PVT) series as the backbone network. It is pretrained on ImageNet-1k. You can download the pre-trained PVT model from 🔗here. The PVT repository: 🔗PVT. After downloading it, put the pvt_v2_b3.pth file in the pretrained_weights folder.

When everything is ready, you can start training by running the following command:

bash train.sh

4. Testing

The test.py script is used to evaluate the model and generate the prediction results. We provide a bash script test.sh to conviently test the model. You should set the --data-dir, --dataset, --model-path to your dataset path, dataset name, and model path. Additionally, the same as training, you should choose the method to build the cross-modal hypergraph by --constr-hg, and set the k value or epsilon value by --constr-k or --constr-threshold.

When everything is ready, you can start testing by running the following command:

bash test.sh

Cite our work📝

Coming soon...

License📜

The source code is free for research and education use only. Any comercial use should get formal permission first.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
datasets		datasets
figures		figures
losses		losses
models		models
utils		utils
README.md		README.md
preprocess_RGBTCC.py		preprocess_RGBTCC.py
requirements.txt		requirements.txt
test.py		test.py
test.sh		test.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hyper-Counter: Cross-Modal Hypergraph Fusion for RGB-T Crowd Counting

Overview🔍

Datasets📚

Experimental Results🏆

Getting Started🚀

1. Data Preparation

(1) RGBT-CC Dataset

(2) Drone-RGBT Dataset

2. Environment Setup

3. Training

4. Testing

Cite our work📝

License📜

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Mengqi-Lei/HyperCounter

Folders and files

Latest commit

History

Repository files navigation

Hyper-Counter: Cross-Modal Hypergraph Fusion for RGB-T Crowd Counting

Overview🔍

Datasets📚

Experimental Results🏆

Getting Started🚀

1. Data Preparation

(1) RGBT-CC Dataset

(2) Drone-RGBT Dataset

2. Environment Setup

3. Training

4. Testing

Cite our work📝

License📜

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages