This is the offical implementation for the paper titled "TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion". Paper Link
"To generate appropriate fusion results for a specific scenario, existing methods cannot realize it or require expensive retraining. The same goal can be achieved by simply adjusting the focused objectives of textual description in our paradigm."
- For the first time, the text modality is introduced to the image fusion field.
- A benchmark dataset.
- A textual attention assessment.
Train Set [Images&Text]: Google Drive
Train Set [Pre-gen Association Maps]: Google Drive
Test Set: Google Drive
Folder structure:
/dataset
--/IVT_train
----/ir
------/1.png
----/vis
------/1.png
----/text
------/1_1.txt
----/association
------/IVT_LLVIP_2000_imageIndex_1_textIndex_1
--------/Final_Finetuned_BinaryInterestedMap.png
/TextFusion
--/main_trainTextFusion.py
--/net.py
--/main_test_rgb_ir.py
Assuming that you already have (download from above links) the pre-gen association map, images, and corresponding textual description in the "IVT_train" folder.
Simply run the following prompt to start the training process:
python main_trainTextFusion.py
The trained models and corresponding loss values will be saved in the "models" folder.
(The code to generate the association map on your own is available in this repository)
NOte: The pre-trained checkpoint can be found in the "models" folder.
For the RGB and infrared image fusion (e.g., LLVIP):
python main_test_rgb_ir.py
Tips: If you are comparing our TextFusion with a pure apperance-based method, you can directly set the "description" as empty for a relative fair experiment.
For the grayscale and infrared image fusion (e.g., TNO):
python main_test_gray_ir.py
Please refer to this folder (textual_attention_metrics).
- Python 3.8.3
- Torch 2.1.2
- torchvision 0.16.2
- opencv-python 4.8.1.78
- tqdm 4.66.5
- ftfy 6.2.3
- regex
- matplotlib
- timm
- 2025-1-4: The implementation of the textual attention metric is now available.
- 2024-11-10: This work has been accepted by Information Fusion.
- 2024-4-30: The codes for generating the association maps are available now!
- 2024-3-14: The training code is available and corresponding pre-gen association maps are uploaded to the Google Drive.
- 2024-3-5: The testing set of our IVT dataset is available now.
- 2024-2-8: The training set of our IVT dataset is available now.
- 2024-2-12: The pre-trained model and test files are available now!
If you have any questions, please contact me at chunyang_cheng@163.com.
If this work is helpful to you, please cite it as:
@article{CHENG2025102790,
title = {TextFusion: Unveiling the power of textual semantics for controllable image fusion},
journal = {Information Fusion},
volume = {117},
pages = {102790},
year = {2025},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2024.102790},
url = {https://www.sciencedirect.com/science/article/pii/S1566253524005682},
author = {Chunyang Cheng and Tianyang Xu and Xiao-Jun Wu and Hui Li and Xi Li and Zhangyong Tang and Josef Kittler},
}
or the original arxiv version:
@article{cheng2023textfusion,
title={TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion},
author={Cheng, Chunyang and Xu, Tianyang and Wu, Xiao-Jun and Li, Hui and Li, Xi and Tang, Zhangyong and Kittler, Josef},
journal={arXiv preprint arXiv:2312.14209},
year={2023}
}
Our dataset is annotated based on the LLVIP dataset:
@inproceedings{jia2021llvip,
title={LLVIP: A visible-infrared paired dataset for low-light vision},
author={Jia, Xinyu and Zhu, Chuang and Li, Minzhen and Tang, Wenqi and Zhou, Wenli},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={3496--3504},
year={2021}
}
