TextFusion

This is the offical implementation for the paper titled "TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion". Paper Link

"To generate appropriate fusion results for a specific scenario, existing methods cannot realize it or require expensive retraining. The same goal can be achieved by simply adjusting the focused objectives of textual description in our paradigm."

Highlight

For the first time, the text modality is introduced to the image fusion field.
A benchmark dataset.
A textual attention assessment.

IVT dataset

"Statistic information of the proposed dataset."

Train Set [Images&Text]: Google Drive

Train Set [Pre-gen Association Maps]: Google Drive

Test Set: Google Drive

The propose model

Folder structure:

/dataset
--/IVT_train
----/ir
------/1.png
----/vis
------/1.png
----/text
------/1_1.txt
----/association
------/IVT_LLVIP_2000_imageIndex_1_textIndex_1
--------/Final_Finetuned_BinaryInterestedMap.png
/TextFusion
--/main_trainTextFusion.py
--/net.py
--/main_test_rgb_ir.py

To train

Assuming that you already have (download from above links) the pre-gen association map, images, and corresponding textual description in the "IVT_train" folder.

Simply run the following prompt to start the training process:

python main_trainTextFusion.py

The trained models and corresponding loss values will be saved in the "models" folder.

(The code to generate the association map on your own is available in this repository)

To test

NOte: The pre-trained checkpoint can be found in the "models" folder.

For the RGB and infrared image fusion (e.g., LLVIP):

python main_test_rgb_ir.py

Tips: If you are comparing our TextFusion with a pure apperance-based method, you can directly set the "description" as empty for a relative fair experiment.

For the grayscale and infrared image fusion (e.g., TNO):

python main_test_gray_ir.py

Textual Attention Metrics

Please refer to this folder (textual_attention_metrics).

Environment

Python 3.8.3
Torch 2.1.2
torchvision 0.16.2
opencv-python 4.8.1.78
tqdm 4.66.5
ftfy 6.2.3
regex
matplotlib
timm

Update

2025-1-4: The implementation of the textual attention metric is now available.
2024-11-10: This work has been accepted by Information Fusion.
2024-4-30: The codes for generating the association maps are available now!
2024-3-14: The training code is available and corresponding pre-gen association maps are uploaded to the Google Drive.
2024-3-5: The testing set of our IVT dataset is available now.
2024-2-8: The training set of our IVT dataset is available now.
2024-2-12: The pre-trained model and test files are available now!

Contact Informaiton

If you have any questions, please contact me at chunyang_cheng@163.com.

Citation

If this work is helpful to you, please cite it as:

@article{CHENG2025102790,
title = {TextFusion: Unveiling the power of textual semantics for controllable image fusion},
journal = {Information Fusion},
volume = {117},
pages = {102790},
year = {2025},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2024.102790},
url = {https://www.sciencedirect.com/science/article/pii/S1566253524005682},
author = {Chunyang Cheng and Tianyang Xu and Xiao-Jun Wu and Hui Li and Xi Li and Zhangyong Tang and Josef Kittler},
}

or the original arxiv version:

@article{cheng2023textfusion,
  title={TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion},
  author={Cheng, Chunyang and Xu, Tianyang and Wu, Xiao-Jun and Li, Hui and Li, Xi and Tang, Zhangyong and Kittler, Josef},
  journal={arXiv preprint arXiv:2312.14209},
  year={2023}
}

Our dataset is annotated based on the LLVIP dataset:

@inproceedings{jia2021llvip,
  title={LLVIP: A visible-infrared paired dataset for low-light vision},
  author={Jia, Xinyu and Zhu, Chuang and Li, Minzhen and Tang, Wenqi and Zhou, Wenli},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  pages={3496--3504},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
Figs		Figs
__pycache__		__pycache__
clip		clip
data		data
models		models
notebooks		notebooks
output		output
pytorch_msssim		pytorch_msssim
samples		samples
textual_attention_metrics		textual_attention_metrics
DenseFuse.model		DenseFuse.model
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
args_fusion.py		args_fusion.py
densefuseNet.py		densefuseNet.py
docker_stderror		docker_stderror
fusion_strategy.py		fusion_strategy.py
genMat.py		genMat.py
hubconf.py		hubconf.py
main_test_gray_ir.py		main_test_gray_ir.py
main_test_rgb_ir.py		main_test_rgb_ir.py
main_trainTextFusion.py		main_trainTextFusion.py
net.py		net.py
setup.py		setup.py
testMat.py		testMat.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TextFusion

Highlight

IVT dataset

The propose model

To train

To test

Textual Attention Metrics

Environment

Update

Contact Informaiton

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AWCXV/TextFusion

Folders and files

Latest commit

History

Repository files navigation

TextFusion

Highlight

IVT dataset

The propose model

To train

To test

Textual Attention Metrics

Environment

Update

Contact Informaiton

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages