-
Notifications
You must be signed in to change notification settings - Fork 78
Open
Description
Hi there,
I'm encountering an issue when evaluating the google/codegemma-7b-it model using evalchemy.
Problem Description:
When running evaluations for google/codegemma-7b-it, all the resulting metrics consistently show a value of 0.0, regardless of the dataset or evaluation setup. This suggests that the model might not be generating valid outputs or that there's an issue with how its outputs are being processed and scored by evalchemy.
Steps to Reproduce:
- Evaluation script
CUDA_VISIBLE_DEVICES=0 python -m eval.eval \ --model vllm \ --tasks "LiveCodeBenchv5_official,HumanEval,HumanEvalPlus,MBPP,MBPPPlus,CruxEval,CodeElo,BigCodeBench" \ --model_args "pretrained=/personal/opensource/models/codegemma-7b-it,tensor_parallel_size=1" \ --batch_size 64 \ --max_tokens 2048 \ --apply_chat_template True \ --output_path logs
- Observe the evaluation results. All metrics are reported as 0.0, except CruxEval.
Any insights or guidance on resolving this would be greatly appreciated. Thank you!
Bests,
John
Metadata
Metadata
Assignees
Labels
No labels