huggingface package version should be exactly as 4.50.0 or you should modify the vision token swapping code based on your own version.
This repository contains the implementation of Vision Token Dropping.
For detailed explanation and code, please refer to the Vision-Token-Dropping folder.
All experiments are conducted under the VFL-LoRA setup.
Please check out our VFL-LoRA for the base code and environment setup.
- Training data for VFL-LoRA
- [β ] Open-Source Code
- [β ] Publish arXiv Paper
If you find this work useful, please cite our paper:
@article{shi2025vision,
title={Vision Function Layer in Multimodal LLMs},
author={Shi, Cheng and Yu, Yizhou and Yang, Sibei},
journal={arXiv preprint arXiv:2509.24791},
year={2025}
}
