Multimodal Differential Transformer

Credit to DALL-E

📄 Project Report 🎥 Reference Source Code/Video

Brief Overview

This project is originally our CS228 (Deep Learning) final project. We explored the integration of Differential Attention into the text-vision model PaliGemma 3B to address challenges posed by noisy information and limited context windows.

We utilized LoRA fine-tuning and adapted/modified Differential Attention into an existing pretrained model for fine-tuning. Based on the first iteration of experiments, we demonstrated potential improvements over a baseline vanilla fine-tune on the Multimodal Needle In Haystack Evaluation.

Further information can be found in our report linked above. There are plans to explore this project more through better evaluations and possibly expanding to the Phi model family.

✅ Todo

Reconduct further evaluations
Experiment with Phi3
Code Cleanup

Installation

Prerequisites

# Better in conda env
pip install -r requirements.txt

Experiments

Our main modifications to the model can be found in modeling_gemma.py and modeling_siglip.py.

Finetune Original Base Model
```
python3 finetune_original.py
```
Finetune Our Model
```
python3 finetune.py
```

Evaluation

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
evaluation		evaluation
images		images
parameters		parameters
.gitignore		.gitignore
README.md		README.md
config_utils.py		config_utils.py
finetune.py		finetune.py
finetune_original.py		finetune_original.py
main.py		main.py
modeling_gemma.py		modeling_gemma.py
modeling_siglip.py		modeling_siglip.py
requirements.txt		requirements.txt
rms_norm.py		rms_norm.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Differential Transformer

Brief Overview

✅ Todo

Installation

Experiments

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Jeli04/Multimodal-Differential-Transformer

Folders and files

Latest commit

History

Repository files navigation

Multimodal Differential Transformer

Brief Overview

✅ Todo

Installation

Experiments

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages