-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I am very interested in your work and have been trying to reproduce your results. Following your instructions, I was able to achieve excellent results in the few-shot setting. However, I have encountered significant trouble with zero-shot learning.
Specifically, when using meta-llama/Llama-3.2-11B-Vision-Instruct on the Movies dataset, I was only able to achieve an accuracy of 10.33 (ACC) by running:
python MLLM/Zero-shot.py --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --num_neighbours 3 --neighbor_mode text --num_samples 300 --max_new_tokens 30 --dataset_name Movies
I am unsure what might be causing such a significant discrepancy and wonder if it could be related to certain hyperparameter settings or training tricks.
For your reference, to run the code, I updated the transformers library from version 4.47.1 to 4.55.0 to correctly import Qwen2_5_VLForConditionalGeneration. I also installed an additional package, qwen-vl-utils==0.0.11.
Finally, when I attempted to use the Qwen/Qwen2-VL-7B-Instruct for zero-shot learning, I encountered the following error:
Error processing node 14646: module 'torch.compiler' has no attribute 'is_compiling'
This seems to be caused by the PyTorch version specified in requirement.yaml being too low, as the is_compiling function is only available in torch>=2.3.0.
I would greatly appreciate any help you could provide. Thank you for your time.