[AAAI 2026] Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation

This is the official PyTorch implementation for Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation (AAAI 2026 Oral).

The repository currently supports the following methods: Source, T3A, Tent, EATA, SAR, DeYO, READ and BriMPR.

Prerequisites

conda create -n mmtta -y python=3.9
conda activate mmtta
pip install -r requirements.txt

Get Started

The corresponding datasets need to be downloaded:

Kinetics50: Refer to READ to download the training set (optional) and validation set.
VGGSound: Refer to READ to download the testing set. The training set (optional) can be downloaded from https://huggingface.co/datasets/Loie/VGGSound.

The pre-trained source model can be found in READ.

Step 1. Introduce corruptions

# Video
python ./make_corruptions/make_c_video.py --corruption 'gaussian_noise' --severity 5 --data-path 'data_path/Kinetics50/image_mulframe_val256_k=50' --save_path 'data_path/Kinetics50/image_mulframe_val256_k-C'

# Audio
python ./make_corruptions/make_c_audio.py --corruption 'gaussian_noise' --severity 5 --data_path 'data_path/Kinetics50/audio_val256_k=50' --save_path 'data_path/Kinetics50/audio_val256_k=50-C' --weather_path 'data_path/NoisyAudios/'

Step 2. Create JSON files

# JSON file for clean data (**Mandatory**):
python ./data_process/create_clean_json.py --refer-path 'code_path/json_csv_files/ks50_test_refer.json' --video-path 'data_path/Kinetics50/image_mulframe_val256_k=50' --audio-path 'data_path/Kinetics50/audio_val256_k=50' --save_path 'code_path/json_csv_files/ks50' --split 'test'

# JSON file for source data:
python ./data_process/create_clean_json.py --refer-path 'code_path/json_csv_files/ks50_train_refer.json' --video-path 'data_path/Kinetics50/image_mulframe_val256_k=50' --audio-path 'data_path/Kinetics50/audio_val256_k=50' --save_path 'code_path/json_csv_files/ks50' --split 'train'

# JSON file for video-corrupted data:
python ./data_process/create_video_c_json.py --clean-path 'code_path/json_csv_files/ks50/clean/severity_0.json' --video-c-path 'data_path/Kinetics50/image_mulframe_val256_k=50-C' --audio-path 'data_path/Kinetics50/audio_val256_k=50' --corruption 'gaussian_noise'

# JSON file for audio-corrupted data:
python ./data_process/create_audio_c_json.py --clean-path 'code_path/json_csv_files/ks50/clean/severity_0.json' --video-path 'data_path/Kinetics50/image_mulframe_val256_k=50' --audio-c-path 'data_path/Kinetics50/audio_val256_k=50-C' --corruption 'gaussian_noise'

# JSON file for audio & video-corrupted data:
python ./data_process/create_both_c_json.py --clean-path 'code_path/json_csv_files/ks50/clean/severity_0.json' --video-c-path 'data_path/Kinetics50/image_mulframe_val256_k=50-C' --audio-c-path 'data_path/Kinetics50/audio_val256_k=50-C' --dataset 'ks50'

Run Experiments

unimodal corruption

# Kinetics50
python run.py --gpu '0, 1, 2' --tta_method BriMPR --corruption_modality [audio/video] --dataset ks50 --json_root 'code_path/json_csv_files/ks50' --label_csv 'code_path/json_csv_files/class_labels_indices_ks50.csv' --pretrain_path 'code_path/pretrained_model/cav_mae_ks50.pth'
# VGGSound
python run.py --gpu '0, 1, 2' --tta_method BriMPR --corruption_modality [audio/video] --dataset vggsound --json_root 'code_path/json_csv_files/vgg' --label_csv 'code_path/json_csv_files/class_labels_indices_vgg.csv' --pretrain_path 'code_path/pretrained_model/vgg_65.5.pth'

multimodal corruption

# Kinetics50
python run_both.py --gpu '0, 1, 2' --tta_method BriMPR --corruption_modality both --dataset ks50 --json_root 'code_path/json_csv_files/ks50' --label_csv 'code_path/json_csv_files/class_labels_indices_ks50.csv' --pretrain_path 'code_path/pretrained_model/cav_mae_ks50.pth'
# VGGSound
python run_both.py --gpu '0, 1, 2' --tta_method BriMPR --corruption_modality both --dataset vggsound --json_root 'code_path/json_csv_files/vgg' --label_csv 'code_path/json_csv_files/class_labels_indices_vgg.csv' --pretrain_path 'code_path/pretrained_model/vgg_65.5.pth'

Citation

If our BriMPR is helpful in your research, please consider citing our paper:

@article{li2025bridgingmodalitiesprogressiverealignment,
  title={Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation}, 
  author={Jiacheng Li and Songhe Feng},
  journal={arXiv preprint arXiv:2511.22862},
  year={2025}
}

Acknowledgements

Thanks for the publicly available code of CAV-MAE, T3A, Tent, EATA, SAR, DeYO and READ.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
TTA		TTA
ckpt		ckpt
data_process		data_process
images		images
json_csv_files		json_csv_files
make_corruptions		make_corruptions
models		models
utilities		utilities
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
requirements.txt		requirements.txt
run.py		run.py
run_both.py		run_both.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[AAAI 2026] Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation

Prerequisites

Get Started

Run Experiments

unimodal corruption

multimodal corruption

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[AAAI 2026] Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation

Prerequisites

Get Started

Run Experiments

unimodal corruption

multimodal corruption

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages