Llava (CLIP+Qwen1.5/2): Towards GPT Level Vision and Speech Interaction

Progress

2024.12.05 🌟Code of Pretrain: train_qwen, builder, llava_qwen (LlavaQwen2ForCausalLM, ~~AutoTokenizer.register()~~)
2024.12.23 🌟Pretrain in liuhaotian/LLaVA-Pretrain
2024.01.04 🌟Add VLMEvalKit of OpenCompass
2025.01.06 🌟SFT(Full parameter and Lora) in llava_v1_5_mix665k
2025.02.03 🌟Different build_prompt for Different mission(VQA, Yes or No, OCR, MME)
2025.to do 🌟Inference in pure C/C++ (llama.cpp)

Training

Requirements and Installation

git clone https://github.com/Nyquist24/Llava_Qwen-Chat.git
conda create -n llavaqwen python=3.10 -y
conda activate llavaqwen
pip install --upgrade pip
pip install -e .
pip install flash-attn --no-build-isolation

Data Preparation

Pretrain: liuhaotian/LLaVA-Pretrain Finetune: liuhaotian/LLaVA-Instruct-150K refer to Llava

Put the pretrain data in your data/, Organize the finetune data as follows in:

├── coco
│   └── train2017
├── gqa
│   └── images
├── ocr_vqa
│   └── images
├── textvqa
│   └── train_images
└── vg
    ├── VG_100K
    └── VG_100K_2

Pratrain

According to llava-series paper, in Pretrain phase, CLIP and Qwen should be Freezed

sh pretrain_qwen2.sh

Tips:

the --output_dir in pretrain_qwen2.sh must contrains 'llava' and 'qwen' (for VLMEvalKit)
the --tune_mm_mlp_adapter should be set True
checkpoints will be saved in ./checkpoints

Finetune

sh finetune_qwen.sh

📏Evaluating on Benchmarks

VLMEvalKit

Add the model path of llava_qwen2 in VLMEvalKit/vlmeval/config.py, like:

llava_series = { 
    'llavaqwen': partial(LLaVAQwen, model_path='/root/ ... /checkpoints/llava_qwen1.5-4B-Chat'),
    ...
}

Follow the instuctions in VLMEvalKit to set the GPT as the judge model.

Then configure the .env file in the VLMEvalKit folder:

OPENAI_API_KEY=sk-
OPENAI_API_BASE=

Evaluating on Different benchmarks:

CUDA_VISIBLE_DEVICES=0 python run.py --data AI2D_TEST OCRBench ... --model llavaqwen --verbose

👍 Acknowledgement

Thanks to the following outstanding works: LLaVA-1.5, Qwen-2.5, VITA, InternVL and VLMEvalkit.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
VLMEvalKit		VLMEvalKit
docs		docs
llava.egg-info		llava.egg-info
llava		llava
scripts		scripts
README.md		README.md
cog.yaml		cog.yaml
finetune_qwen2.sh		finetune_qwen2.sh
predict.py		predict.py
pretrain_qwen2.sh		pretrain_qwen2.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_cli.sh		run_cli.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llava (CLIP+Qwen1.5/2): Towards GPT Level Vision and Speech Interaction

Progress

Contents

Training

Requirements and Installation

Data Preparation

Pratrain

Finetune

📏Evaluating on Benchmarks

VLMEvalKit

👍 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Llava (CLIP+Qwen1.5/2): Towards GPT Level Vision and Speech Interaction

Progress

Contents

Training

Requirements and Installation

Data Preparation

Pratrain

Finetune

📏Evaluating on Benchmarks

VLMEvalKit

👍 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages