Skip to content

[Bug] google/codegemma-7b-it Evaluation Results All Zero #147

@juyongjiang

Description

@juyongjiang

Hi there,

I'm encountering an issue when evaluating the google/codegemma-7b-it model using evalchemy.

Problem Description:
When running evaluations for google/codegemma-7b-it, all the resulting metrics consistently show a value of 0.0, regardless of the dataset or evaluation setup. This suggests that the model might not be generating valid outputs or that there's an issue with how its outputs are being processed and scored by evalchemy.

Steps to Reproduce:

  1. Evaluation script
    CUDA_VISIBLE_DEVICES=0 python -m eval.eval \
      --model vllm \
      --tasks "LiveCodeBenchv5_official,HumanEval,HumanEvalPlus,MBPP,MBPPPlus,CruxEval,CodeElo,BigCodeBench" \
      --model_args "pretrained=/personal/opensource/models/codegemma-7b-it,tensor_parallel_size=1" \
      --batch_size 64 \
      --max_tokens 2048 \
      --apply_chat_template True \
      --output_path logs
  2. Observe the evaluation results. All metrics are reported as 0.0, except CruxEval.
Image

Any insights or guidance on resolving this would be greatly appreciated. Thank you!

Bests,
John

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions