Skip to content

Jeli04/Multimodal-Differential-Transformer

Repository files navigation

Multimodal Differential Transformer

Project Overview

Credit to DALL-E



Brief Overview

This project is originally our CS228 (Deep Learning) final project. We explored the integration of Differential Attention into the text-vision model PaliGemma 3B to address challenges posed by noisy information and limited context windows.

We utilized LoRA fine-tuning and adapted/modified Differential Attention into an existing pretrained model for fine-tuning. Based on the first iteration of experiments, we demonstrated potential improvements over a baseline vanilla fine-tune on the Multimodal Needle In Haystack Evaluation.

Further information can be found in our report linked above. There are plans to explore this project more through better evaluations and possibly expanding to the Phi model family.


✅ Todo

  • Reconduct further evaluations
  • Experiment with Phi3
  • Code Cleanup

Installation

  1. Prerequisites
    # Better in conda env
    pip install -r requirements.txt
    

Experiments

Our main modifications to the model can be found in modeling_gemma.py and modeling_siglip.py. 
  1. Finetune Original Base Model
    python3 finetune_original.py
  2. Finetune Our Model
    python3 finetune.py
    

Evaluation

  1. TODO

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages