Skip to content

feat(evaluate): support llm rater configuration in evalbench generator#64

Merged
wangauone merged 5 commits intofeat/hill-climbingfrom
feat/add-llmrater-support
Apr 3, 2026
Merged

feat(evaluate): support llm rater configuration in evalbench generator#64
wangauone merged 5 commits intofeat/hill-climbingfrom
feat/add-llmrater-support

Conversation

@wangauone
Copy link
Copy Markdown
Collaborator

Adds support for the EvalBench LLM Rater to enable semantic result comparisons during automated evaluations.

  • Generates a dedicated llmrater_config.yaml using gemini-3.1-pro-preview.
  • Wires the LLM rater block into the run_config.yaml scorers definition.
  • Updates unit tests to validate the new configuration output.

@wangauone wangauone requested a review from helloeve April 3, 2026 18:41
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an LLM-based rater configuration to the evaluation generator, adding a new llmrater_config.yaml file and updating the run configuration to include set_match and llmrater scorers. Feedback suggests retaining the previously removed deterministic scorers (exact_match and executable_sql) to maintain comprehensive evaluation, avoiding the use of hardcoded preview model versions in the generator, and enhancing tests to verify the actual content of the newly generated configuration.

Comment thread mcp/evaluate/evaluate_generator.py Outdated
Comment thread mcp/evaluate/evaluate_generator.py Outdated
Comment thread mcp/tests/evaluate/evaluate_generator_test.py
Comment thread mcp/evaluate/evaluate_generator.py Outdated
@wangauone wangauone merged commit d181b82 into feat/hill-climbing Apr 3, 2026
2 checks passed
@wangauone wangauone deleted the feat/add-llmrater-support branch April 3, 2026 22:25
wangauone added a commit that referenced this pull request Apr 21, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.5.0](v0.4.3...v0.5.0)
(2026-04-21)


### Features

* **autoctx:** introduce automated context generation
([#95](#95))
([57f8aa7](57f8aa7))
* **bootstrap:** enrich workflow with user-provided docs and code
([#70](#70))
([04c8455](04c8455))
* **bootstrap:** enrich workflow with user-provided docs and code
([#72](#72))
([bd04743](bd04743))
* **evaluate:** add custom runner configs to lower evalbench parallelism
([#65](#65))
([f0758fa](f0758fa))
* **evaluate:** populate gcp_project_id in llmrater config
([#90](#90))
([016af70](016af70))
* **evaluate:** simplify golden dataset format and write configs dire…
([#68](#68))
([a76b4bb](a76b4bb))
* **evaluate:** support llm rater configuration in evalbench generator
([#64](#64))
([d181b82](d181b82))
* **evaluate:** unify failure reporting and add execution errors
([#80](#80))
([c11ff47](c11ff47))
* **evaluate:** use gemini-2.5-pro for evaluation rater
([#67](#67))
([5c8f92d](5c8f92d))
* **facet:** enforce qualified table names in prompts
([#85](#85))
([b37480f](b37480f))
* **hillclimb:** support eval result reading tool
([#69](#69))
([bb5a249](bb5a249))
* **mcp:** add autoctx-hillclimb workflow skill and tools
([#59](#59))
([2236964](2236964))
* **mcp:** adopt ADC support for Cloud SQL and AlloyDB
([#78](#78))
([f4091fa](f4091fa))
* **mcp:** centralize Gemini model configuration
([#73](#73))
([97c5c25](97c5c25))
* **mcp:** migrate autoctx infrastructure to autoctx/ folder
([#92](#92))
([0b4f52d](0b4f52d))
* **mcp:** switch default model to gemini-2.5-flash
([#93](#93))
([60800e5](60800e5))
* **mcp:** update evalbench version to 1.4.0
([#83](#83))
([0b4ffb3](0b4ffb3))
* **skills:** update bootstrap skill to include upload URL
([#77](#77))
([fd65bff](fd65bff))


### Miscellaneous Chores

* force release version 0.5.0
([da04e7c](da04e7c))
* force release version 0.5.0
([#97](#97))
([82364d6](82364d6))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants