Skip to content

Releases: GoogleCloudPlatform/evalbench

v1.1.0

20 Mar 16:32
6294981

Choose a tag to compare

1.1.0 (2026-03-20)

Features

  • Add a Gemini-powered dataset translation tool. (#257) (a5c0359)
  • Add Cloud Run support and make the server port configurable via… (#234) (34110b1)
  • add evalbench release pipeline and bundling (#276) (a68b348)
  • Add Gemini 3.0 Pro and 3.1 Pro preview model configurations (f8f036c)
  • add QueryData API generator and refactor SQLGenWork (#281) (44d07dc)
  • Add remote MCP server connectivity verification (7bf5716)
  • Add remote MCP server connectivity verification (a64aa37)
  • Add support for syncing Gemini CLI skills to fake home (7e2265b)
  • Configure a dedicated home directory and user for evalbench within the Docker container. (89238f5)
  • Configure GCS FUSE for session management and expose new ports for UI and metrics. (b02489e)
  • Enable session-specific fake home directories for Gemini CLI and improve JSON parsing, while passing the session ID to the generator configuration. (0e0c06b)
  • Enhance Evalbench Viewer UI (#252) (e3a2f95)
  • Enhance results directory discovery in the viewer and ensure the CSV reporter outputs to a shared volume when running in server mode. (a4761e1)
  • Install Node.js via NodeSource PPA, consolidating package installations and removing NVM. (a9f2741)
  • Introduce Horizontal Pod Autoscaler, offload blocking evaluatio… (#269) (a639282)
  • Introduce Horizontal Pod Autoscaler, offload blocking evaluation tasks to a thread pool, and enhance session manager robustness. (6024fb3)
  • Multi run orchestrator (#258) (aec92c9)
  • Schema, Database Instantiation (#259) (dcb8bf6)
  • spanner: Improve and extend support for Spanner Client (#247) (ac6625a)
  • Sync Gemini CLI skills into fake_home (93e6265)

Bug Fixes

  • Configure absl.logging to output to stdout and initialize its handler. (560d0ee)
  • Correct Gemini CLI response parsing to strip markdown code blocks and remove a redundant prompt argument, and update Makefile container names, pre-run cleanup, and volume mount paths. (#275) (daa0821)
  • dataset: preserve multi-dialect golden_sql for BIRD (#262) (12ccf98)
  • handle empty MySQL passwords and add Cloud SQL support to ensure_database_exists (#268) (beef7ec)
  • implement timeouts to prevent thread hanging in evaluator (#266) (bb77c2f)
  • prevent execution thread deadlocks and db connection leaks (#267) (265fee8)
  • Prevent logging handler from closing sys.stdout by wrapping it in an UncloseableStream. (d7c453e)
  • various improvements, fixes to the SpannerDB driver (#264) (5c6f425)

EvalBEnch 1.0

19 Nov 03:37
dfa9f16

Choose a tag to compare

First release of EvalBench.

What's Changed

New Contributors

Full Changelog: https://github.com/GoogleCloudPlatform/evalbench/commits/v1.0