Vision-Function-Layer in Multimodal LLMs

⚠️ The huggingface package version should be exactly as 4.50.0 or you should modify the vision token swapping code based on your own version.

Vision Token Dropping

This repository contains the implementation of Vision Token Dropping.
For detailed explanation and code, please refer to the Vision-Token-Dropping folder.

🚀 Experiments

All experiments are conducted under the VFL-LoRA setup.
Please check out our VFL-LoRA for the base code and environment setup.

✅ TODO List

Training data for VFL-LoRA
[✅] Open-Source Code
[✅] Publish arXiv Paper

Citation

If you find this work useful, please cite our paper:

@article{shi2025vision,
  title={Vision Function Layer in Multimodal LLMs},
  author={Shi, Cheng and Yu, Yizhou and Yang, Sibei},
  journal={arXiv preprint arXiv:2509.24791},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
VFL-LoRA		VFL-LoRA
Vision-Token-Dropping		Vision-Token-Dropping
imgs		imgs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vision-Function-Layer in Multimodal LLMs

Vision Token Dropping

🚀 Experiments

✅ TODO List

Citation

About

Uh oh!

Releases 1

Packages

Languages

ChengShiest/Vision-Function-Layer

Folders and files

Latest commit

History

Repository files navigation

Vision-Function-Layer in Multimodal LLMs

Vision Token Dropping

🚀 Experiments

✅ TODO List

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages