[Bug] google/codegemma-7b-it Evaluation Results All Zero

Hi there,

I'm encountering an issue when evaluating the `google/codegemma-7b-it` model using `evalchemy`.

**Problem Description:**
When running evaluations for `google/codegemma-7b-it`, all the resulting metrics consistently show a value of 0.0, regardless of the dataset or evaluation setup. This suggests that the model might not be generating valid outputs or that there's an issue with how its outputs are being processed and scored by `evalchemy`.

**Steps to Reproduce:**

1.  Evaluation script
    ```bash
    CUDA_VISIBLE_DEVICES=0 python -m eval.eval \
      --model vllm \
      --tasks "LiveCodeBenchv5_official,HumanEval,HumanEvalPlus,MBPP,MBPPPlus,CruxEval,CodeElo,BigCodeBench" \
      --model_args "pretrained=/personal/opensource/models/codegemma-7b-it,tensor_parallel_size=1" \
      --batch_size 64 \
      --max_tokens 2048 \
      --apply_chat_template True \
      --output_path logs
    ```
2.  Observe the evaluation results. All metrics are reported as 0.0, except CruxEval.

<img width="736" height="1736" alt="Image" src="https://github.com/user-attachments/assets/2adaafa2-26f5-45e5-8442-8d4b5fc4458c" />

Any insights or guidance on resolving this would be greatly appreciated. Thank you!

Bests,
John

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] google/codegemma-7b-it Evaluation Results All Zero #147

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] google/codegemma-7b-it Evaluation Results All Zero #147

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions