feat(evaluate): support llm rater configuration in evalbench generator by wangauone · Pull Request #64 · GoogleCloudPlatform/db-context-enrichment

wangauone · 2026-04-03T18:41:15Z

Adds support for the EvalBench LLM Rater to enable semantic result comparisons during automated evaluations.

Generates a dedicated llmrater_config.yaml using gemini-3.1-pro-preview.
Wires the LLM rater block into the run_config.yaml scorers definition.
Updates unit tests to validate the new configuration output.

gemini-code-assist

Code Review

This pull request introduces an LLM-based rater configuration to the evaluation generator, adding a new llmrater_config.yaml file and updating the run configuration to include set_match and llmrater scorers. Feedback suggests retaining the previously removed deterministic scorers (exact_match and executable_sql) to maintain comprehensive evaluation, avoiding the use of hardcoded preview model versions in the generator, and enhancing tests to verify the actual content of the newly generated configuration.

🤖 I have created a release *beep* *boop* --- ## [0.5.0](v0.4.3...v0.5.0) (2026-04-21) ### Features * **autoctx:** introduce automated context generation ([#95](#95)) ([57f8aa7](57f8aa7)) * **bootstrap:** enrich workflow with user-provided docs and code ([#70](#70)) ([04c8455](04c8455)) * **bootstrap:** enrich workflow with user-provided docs and code ([#72](#72)) ([bd04743](bd04743)) * **evaluate:** add custom runner configs to lower evalbench parallelism ([#65](#65)) ([f0758fa](f0758fa)) * **evaluate:** populate gcp_project_id in llmrater config ([#90](#90)) ([016af70](016af70)) * **evaluate:** simplify golden dataset format and write configs dire… ([#68](#68)) ([a76b4bb](a76b4bb)) * **evaluate:** support llm rater configuration in evalbench generator ([#64](#64)) ([d181b82](d181b82)) * **evaluate:** unify failure reporting and add execution errors ([#80](#80)) ([c11ff47](c11ff47)) * **evaluate:** use gemini-2.5-pro for evaluation rater ([#67](#67)) ([5c8f92d](5c8f92d)) * **facet:** enforce qualified table names in prompts ([#85](#85)) ([b37480f](b37480f)) * **hillclimb:** support eval result reading tool ([#69](#69)) ([bb5a249](bb5a249)) * **mcp:** add autoctx-hillclimb workflow skill and tools ([#59](#59)) ([2236964](2236964)) * **mcp:** adopt ADC support for Cloud SQL and AlloyDB ([#78](#78)) ([f4091fa](f4091fa)) * **mcp:** centralize Gemini model configuration ([#73](#73)) ([97c5c25](97c5c25)) * **mcp:** migrate autoctx infrastructure to autoctx/ folder ([#92](#92)) ([0b4f52d](0b4f52d)) * **mcp:** switch default model to gemini-2.5-flash ([#93](#93)) ([60800e5](60800e5)) * **mcp:** update evalbench version to 1.4.0 ([#83](#83)) ([0b4ffb3](0b4ffb3)) * **skills:** update bootstrap skill to include upload URL ([#77](#77)) ([fd65bff](fd65bff)) ### Miscellaneous Chores * force release version 0.5.0 ([da04e7c](da04e7c)) * force release version 0.5.0 ([#97](#97)) ([82364d6](82364d6)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

feat(evaluate): support llm rater configuration in evalbench generator

6edc1f1

wangauone requested a review from helloeve April 3, 2026 18:41

gemini-code-assist bot reviewed Apr 3, 2026

View reviewed changes

Comment thread mcp/evaluate/evaluate_generator.py Outdated

Comment thread mcp/evaluate/evaluate_generator.py Outdated

Comment thread mcp/tests/evaluate/evaluate_generator_test.py

wangauone added 2 commits April 3, 2026 13:51

fix(evaluate): add explicit global region to llmrater config

b7066fc

feat(evaluate): remove set_match scorer and update tests

cc3afef

helloeve approved these changes Apr 3, 2026

View reviewed changes

Comment thread mcp/evaluate/evaluate_generator.py Outdated

wangauone added 2 commits April 3, 2026 15:13

feat(evaluate): switch llmrater model to gemini-2.5-flash

af25c07

feat(evaluate): increase llmrater execs_per_minute to 20

935140c

wangauone merged commit d181b82 into feat/hill-climbing Apr 3, 2026
2 checks passed

wangauone deleted the feat/add-llmrater-support branch April 3, 2026 22:25

release-please bot mentioned this pull request Apr 20, 2026

chore(main): release 0.5.0 #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluate): support llm rater configuration in evalbench generator#64

feat(evaluate): support llm rater configuration in evalbench generator#64
wangauone merged 5 commits intofeat/hill-climbingfrom
feat/add-llmrater-support

wangauone commented Apr 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wangauone commented Apr 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants