[NeurIPS 2024] All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation

Important Update [3/11/2025]

If you have previously pulled our repository, please update to the latest version immediately to fix a critical bug caused by the natten version.

We have updated the natten version requirement to natten>=0.17.0,<=0.17.5 and explicitly set rel_pos_bias=True in NeighborhoodAttention2D within layers.py to maintain consistency with the previous behavior of natten (which defaults to False since 0.17.0 and is dropped after 0.17.5).

This discrepancy can significantly affect the quality of reconstructed images and task results when using our provided weights. Please pull the latest updates to ensure correct experimental reproduction.

Introduction

This repository is the offical PyTorch implementation of All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation (NeurIPS 2024).

Abstract: Image coding for multi-task applications, catering to both human perception and machine vision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading to high overhead of parameter and bitrate usage, or face challenges in multi-objective optimization under a unified representation, failing to achieve both performance and efficiency. To this end, we propose Multi-Path Aggregation (MPA) integrated into existing coding models for joint human-machine vision, unifying the feature representation with an all-in-one architecture. MPA employs a predictor to allocate latent features among task-specific paths based on feature importance varied across tasks, maximizing the utility of shared features while preserving task-specific features for subsequent refinement. Leveraging feature correlations, we develop a two-stage optimization strategy to alleviate multi-task performance degradation. Upon the reuse of shared features, as low as 1.89% parameters are further augmented and fine-tuned for a specific task, which completely avoids extensive optimization of the entire model. Experimental results show that MPA achieves performance comparable to state-of-the-art methods in both task-specific and multi-objective optimization across human viewing and machine analysis tasks. Moreover, our all-in-one design supports seamless transitions between human- and machine-oriented reconstruction, enabling task-controllable interpretation without altering the unified model.

TODO List

This repository is still under active construction:

Release training and testing codes
Release pretrained models
Release visualization tools (placed in ./notebooks)

Preparation

The experiments were conducted on a single NVIDIA RTX 3090 with PyTorch 2.2.1, CUDA 11.8 and CuDNN8 (in the docker environment). Recommend to use PyTorch 2.6.0 to support natten 0.17.5. Create the environment, clone the project and then run the following code to complete the setup:

apt update
apt install libgl1-mesa-dev ffmpeg libsm6 libxext6 # for opencv-python
git clone https://github.com/NJUVISION/MPA.git
cd MPA
pip install -U pip
pip install natten==0.17.5+torch260cu126 -f https://whl.natten.org
pip install -e .

For datasets, please follow TinyLIC, ConvNeXt and PSPNet to prepare Flicker2W, ImageNet-1K and ADE20K.

Pretrained Models

The trained weights after each step can be downloaded from Google Drive and Baidu Drive (access code: y1cs).

Training

The training is completed by the following steps:

Step1: Run the script for variable-rate compression without GAN training pipeline:

python examples/train_stage1_wo_gan.py -m mpa_enc -d /path/to/dataset/ --epochs 400 -lr 1e-4 --batch_size 8 --cuda --save

Step2: Run the script for variable-rate compression with GAN training pipeline:

python examples/train_stage1_w_gan.py -m mpa_enc -d /path/to/dataset/ --epochs 400 -lr 1e-4 -lrd 1e-4 --batch_size 8 --cuda --save --pretrained /path/to/step1/checkpoint.pth.tar

Step3: Run the script for multi-task coding applications:

# for low distortion
python examples/train_stage2_mse.py -m mpa --task_idx 0 -d /path/to/dataset/ --epochs 200 -lr 1e-4 --batch_size 8 --cuda --save --pretrained /path/to/step2/checkpoint.pth.tar

# for classification
python examples/train_stage2_cls.py -m mpa --task_idx 1 -d /path/to/imagenet-1k/ --epochs 4 -lr 1e-4 --batch_size 8 --cuda --save --pretrained /path/to/step2/checkpoint.pth.tar

# for semantic segmentation
python examples/train_stage2_seg.py -m mpa --task_idx 2 -a psp -d /path/to/ade20k/ --epochs 200 -lr 1e-4 --batch_size 8 --cuda --save --pretrained /path/to/step2/checkpoint.pth.tar

The training checkpoints will be generated in the "checkpoints" folder at the current directory. You can change the default folder by modifying the function "init()" in "expample/train.py".

For semantic segmentation, please download the checkpoint of PSPNet from the official repo first, and save it to checkpoints/pspnet/pspnet_train_epoch_100.pth.

Testing

An example to evaluate R-D performance:

# high realism
python -m compressai.utils.eval_var_model checkpoint /path/to/dataset/ -a mpa -p ./path/to/step3/checkpoint.pth.tar --cuda --task_idx 0 --q_task 1 --save /path/to/save_dir/

# low distortion
python -m compressai.utils.eval_var_model checkpoint /path/to/dataset/ -a mpa -p ./path/to/step3/checkpoint.pth.tar --cuda --task_idx 0 --q_task 8 --save /path/to/save_dir/

An example to evaluate classification performance:

python examples/eval_cls_real_bpp.py -m mpa --task_idx 1 --cls_model convnext_tiny.fb_in1k -d /path/to/imagenet-1k/ --test_batch_size 16 --cuda --save --pretrained ./path/to/step3/checkpoint.pth.tar --q_task 8 --real_bpp

An example to evaluate semantic segmentation performance:

python examples/eval_seg_real_bpp.py -m mpa --task_idx 2 -a psp -d /path/to/ade20k/ --test_batch_size 16 --cuda --save --pretrained ./path/to/step3/checkpoint.pth.tar --q_task 8 --real_bpp

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{zhang2024allinone,
    author = {Zhang, Xu and Guo, Peiyao and Lu, Ming and Ma, Zhan},
    booktitle = {Advances in Neural Information Processing Systems},
    editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
    pages = {71465--71503},
    publisher = {Curran Associates, Inc.},
    title = {All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation},
    url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/8395fdf356059eaa92afd39e3952a677-Paper-Conference.pdf},
    volume = {37},
    year = {2024}
}

Acknowledgements

Our code is based on TinyLIC, CompressAI, NATTEN, DynamicViT, pytorch-image-models, ConvNeXt, Swin-Transformer and PSPNet. We would like to acknowledge the valuable contributions of the authors for their outstanding works and the availability of their open-source codes, which significantly benefited our work.

If you're interested in visual coding for machine, you can check out the following work from us:

[ICME 2025 (Oral)] Perception-Oriented Latent Coding for High-Performance Compressed Domain Semantic Inference

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
compressai		compressai
examples		examples
notebooks		notebooks
seg_data		seg_data
seg_lib/psa		seg_lib/psa
seg_model		seg_model
seg_util		seg_util
third_party/ryg_rans		third_party/ryg_rans
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[NeurIPS 2024] All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation

Important Update [3/11/2025]

Introduction

TODO List

Preparation

Pretrained Models

Training

Testing

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

NJUVISION/MPA

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2024] All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation

Important Update [3/11/2025]

Introduction

TODO List

Preparation

Pretrained Models

Training

Testing

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages