2024.12.05🌟Code of Pretrain: train_qwen, builder, llava_qwen (LlavaQwen2ForCausalLM,AutoTokenizer.register())2024.12.23🌟Pretrain in liuhaotian/LLaVA-Pretrain2024.01.04🌟Add VLMEvalKit of OpenCompass2025.01.06🌟SFT(Full parameter and Lora) in llava_v1_5_mix665k2025.02.03🌟Different build_prompt for Different mission(VQA, Yes or No, OCR, MME)2025.to do🌟Inference in pure C/C++ (llama.cpp)
- Llava (CLIP+Qwen1.5/2): Towards GPT Level Vision and Speech Interaction
git clone https://github.com/Nyquist24/Llava_Qwen-Chat.git
conda create -n llavaqwen python=3.10 -y
conda activate llavaqwen
pip install --upgrade pip
pip install -e .
pip install flash-attn --no-build-isolation
Pretrain: liuhaotian/LLaVA-Pretrain Finetune: liuhaotian/LLaVA-Instruct-150K refer to Llava
- Put the pretrain data in your data/, Organize the finetune data as follows in:
├── coco
│ └── train2017
├── gqa
│ └── images
├── ocr_vqa
│ └── images
├── textvqa
│ └── train_images
└── vg
├── VG_100K
└── VG_100K_2
According to llava-series paper, in Pretrain phase, CLIP and Qwen should be Freezed
sh pretrain_qwen2.sh
Tips:
- the --output_dir in pretrain_qwen2.sh must contrains 'llava' and 'qwen' (for VLMEvalKit)
- the --tune_mm_mlp_adapter should be set True
- checkpoints will be saved in ./checkpoints
sh finetune_qwen.sh
Add the model path of llava_qwen2 in VLMEvalKit/vlmeval/config.py, like:
llava_series = {
'llavaqwen': partial(LLaVAQwen, model_path='/root/ ... /checkpoints/llava_qwen1.5-4B-Chat'),
...
}
Follow the instuctions in VLMEvalKit to set the GPT as the judge model.
Then configure the .env file in the VLMEvalKit folder:
OPENAI_API_KEY=sk-
OPENAI_API_BASE=
Evaluating on Different benchmarks:
CUDA_VISIBLE_DEVICES=0 python run.py --data AI2D_TEST OCRBench ... --model llavaqwen --verbose
Thanks to the following outstanding works: LLaVA-1.5, Qwen-2.5, VITA, InternVL and VLMEvalkit.