Credit to DALL-E
This project is originally our CS228 (Deep Learning) final project. We explored the integration of Differential Attention into the text-vision model PaliGemma 3B to address challenges posed by noisy information and limited context windows.
We utilized LoRA fine-tuning and adapted/modified Differential Attention into an existing pretrained model for fine-tuning. Based on the first iteration of experiments, we demonstrated potential improvements over a baseline vanilla fine-tune on the Multimodal Needle In Haystack Evaluation.
Further information can be found in our report linked above. There are plans to explore this project more through better evaluations and possibly expanding to the Phi model family.
- Reconduct further evaluations
- Experiment with Phi3
- Code Cleanup
- Prerequisites
# Better in conda env pip install -r requirements.txt
Our main modifications to the model can be found in modeling_gemma.py and modeling_siglip.py.
- Finetune Original Base Model
python3 finetune_original.py
- Finetune Our Model
python3 finetune.py
- TODO
