support prompt_logp_compute_kv_cache in no vllm trainer#82
support prompt_logp_compute_kv_cache in no vllm trainer#82Yukino256 wants to merge 8 commits intoStarsfieldAI:mainfrom
Conversation
|
hi thanks for your contribution. can you provide more detailed running time comparison and the performance comparison on geoqa and clevr ? |
Hi, thank you. I will try it as soon as possible🤗 |
@chenllliang Using Qwen2-VL-7B-Instruct for test: In th GEOQA dataset, the raw code has 105s/it, and my changed code has 114s/it in 8*A800 80G. my test code is: |
After upadting your commit, it seems an error occours. |
hello! It seems that the code is not updated ? I changed this line into multi lines, may be the git didn't work properly? |
|
Sorry but I think you didn't update the code in the right place ~ |
|
I also have another question: I set the mini_batch_size set 1, and the number of my prompt tokens could be around 2100, but it stiil occurs OOM on 8XA100 80G. |
Hello, I'm checking the code error. And about the OOM error, does |
|
@ZCMax Hello! Bugs should be fixed! I think trying 7B with zero3 is OK !🥲🥲 |
请教一下,加了zero3好像多机就会卡住? |
|
@Yukino256 请问您有遇到类似的报错?在utils.py的内部应该是由o3引起的,好像不会对结果造成影响,但我还是有点疑惑 |
这个我用他这个源代码就有,应该不是我这个代码加上去的 |
|
mark this |






solving this issue: #71
and the code mainly copied and modified from andyl98:grpo-vram-optimization
In my test the grpo runs

faster at least 3xwell without OOM in Qwen2VL-7B model:And, since I have never successfully run the vllm version, I can't modify the vllm_trainer code.
my test code is:
And you can add:
--logit_computation_mini_batch_size Xif the trl package is newest