Skip to content

Latest commit

 

History

History
81 lines (55 loc) · 9.66 KB

File metadata and controls

81 lines (55 loc) · 9.66 KB

Changelog

1.4.0 (2026-04-15)

Features

  • scorer/llmrater: add fallback to SQL logic comparison for empty results (#326) (d168ac0)

1.3.1 (2026-04-10)

Bug Fixes

  • databases/alloydb: restore correct use_adc flag behavior (#315) (909e11d)
  • generators/query_data_api: add retry support for transient API errors (#317) (e5fdead)

1.3.0 (2026-04-09)

Features

  • Add summary_in_response and improve LLM rater resilience (#311) (68b72ee)

1.2.0 (2026-04-07)

Features

  • adc: support ADC for database authentication (#306) (6cb05e6)
  • add Cloud Run support with entrypoint script, custom CSS, and environment-based XSRF configuration (82fdeca)
  • add UV_NO_SYNC support to run script and update Dockerfile and cloudbuild configuration accordingly (43731f9)
  • allow database name mapping via config (#303) (3e8d25a)
  • geminicli: populate adc in fake home (01c9c5b)
  • geminicli: populate adc in fake home (ce06c9b)
  • implement on_load logic to auto-select job directory from query parameters (4691de4)

Bug Fixes

  • consolidate experiment_config flag into util/flags.py (#304) (432d11e)
  • handle empty queries safely, ensure golden execution, and parse config robustly (#265) (9ba022b)
  • remove backticks from sanitized SQL strings (#297) (4e4e201)

1.1.0 (2026-03-20)

Features

  • Add a Gemini-powered dataset translation tool. (#257) (a5c0359)
  • Add Cloud Run support and make the server port configurable via… (#234) (34110b1)
  • add evalbench release pipeline and bundling (#276) (a68b348)
  • Add Gemini 3.0 Pro and 3.1 Pro preview model configurations (f8f036c)
  • add QueryData API generator and refactor SQLGenWork (#281) (44d07dc)
  • Add remote MCP server connectivity verification (7bf5716)
  • Add remote MCP server connectivity verification (a64aa37)
  • Add support for syncing Gemini CLI skills to fake home (7e2265b)
  • Configure a dedicated home directory and user for evalbench within the Docker container. (89238f5)
  • Configure GCS FUSE for session management and expose new ports for UI and metrics. (b02489e)
  • Enable session-specific fake home directories for Gemini CLI and improve JSON parsing, while passing the session ID to the generator configuration. (0e0c06b)
  • Enhance Evalbench Viewer UI (#252) (e3a2f95)
  • Enhance results directory discovery in the viewer and ensure the CSV reporter outputs to a shared volume when running in server mode. (a4761e1)
  • Install Node.js via NodeSource PPA, consolidating package installations and removing NVM. (a9f2741)
  • Introduce Horizontal Pod Autoscaler, offload blocking evaluatio… (#269) (a639282)
  • Introduce Horizontal Pod Autoscaler, offload blocking evaluation tasks to a thread pool, and enhance session manager robustness. (6024fb3)
  • Multi run orchestrator (#258) (aec92c9)
  • Schema, Database Instantiation (#259) (dcb8bf6)
  • spanner: Improve and extend support for Spanner Client (#247) (ac6625a)
  • Sync Gemini CLI skills into fake_home (93e6265)

Bug Fixes

  • Configure absl.logging to output to stdout and initialize its handler. (560d0ee)
  • Correct Gemini CLI response parsing to strip markdown code blocks and remove a redundant prompt argument, and update Makefile container names, pre-run cleanup, and volume mount paths. (#275) (daa0821)
  • dataset: preserve multi-dialect golden_sql for BIRD (#262) (12ccf98)
  • handle empty MySQL passwords and add Cloud SQL support to ensure_database_exists (#268) (beef7ec)
  • implement timeouts to prevent thread hanging in evaluator (#266) (bb77c2f)
  • prevent execution thread deadlocks and db connection leaks (#267) (265fee8)
  • Prevent logging handler from closing sys.stdout by wrapping it in an UncloseableStream. (d7c453e)
  • various improvements, fixes to the SpannerDB driver (#264) (5c6f425)