Skip to content

feat(mcp): add autoctx-hillclimb workflow skill and tools#59

Merged
wangauone merged 6 commits intofeat/hill-climbingfrom
feat/autoctx-hillclimb-workflow
Apr 2, 2026
Merged

feat(mcp): add autoctx-hillclimb workflow skill and tools#59
wangauone merged 6 commits intofeat/hill-climbingfrom
feat/autoctx-hillclimb-workflow

Conversation

@wangauone
Copy link
Copy Markdown
Collaborator

This PR introduces the autoctx-hillclimb workflow, enabling systematic, iterative refinement of context sets based on evaluation metrics. It adds a guided skill document, a CLI trigger, and a structured mutation tool to automate workspace validation, gap analysis, and context mutations.

Key Changes:

  • Guided Skill (SKILL.md):
    • Phase 1: Gap Analysis: Standardized error taxonomy (e.g., [EntityError], [GoldenDataError]) and report format for capturing fails for all failed queries.
    • Human-in-the-Loop Review: Explicit pausing for user feedback before proceeding with mutations.
    • Phase 2: Context Mutation: Guidelines for Conciseness and Generalizability (less context to cover more scenarios).
  • Structured Mutation Tool (mutate_context_set): Exposed a specialized MCP tool to programmatically update context JSON files without manual JSON editing (prevents JSON schema corruption).
  • CLI Command (hillclimb.toml): Added definitions to trigger the workflow via gemini run hillclimb.

@wangauone wangauone requested a review from helloeve April 1, 2026 01:53
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a hill-climbing refinement workflow for ContextSets, including a new configuration file, a detailed workflow guide in SKILL.md, and a mutate_context_set tool in mcp/main.py. The feedback focuses on improving the robustness of the new tool by ensuring the documentation provides a complete schema example for template additions and adding a type check to verify that the input mutations JSON is a list before processing.

Comment thread mcp/main.py Outdated
Comment thread mcp/main.py
Comment thread mcp/skills/autoctx-hillclimb/SKILL.md Outdated
Comment thread mcp/skills/autoctx-hillclimb/SKILL.md
Comment thread mcp/skills/autoctx-hillclimb/SKILL.md
Comment thread mcp/skills/autoctx-hillclimb/SKILL.md
Comment thread mcp/skills/autoctx-hillclimb/SKILL.md Outdated
Comment thread mcp/skills/autoctx-hillclimb/SKILL.md Outdated
Comment thread mcp/skills/autoctx-hillclimb/SKILL.md
Base automatically changed from feat/autoctx-evaluate-workflow to feat/hill-climbing April 1, 2026 20:07
@wangauone wangauone force-pushed the feat/autoctx-hillclimb-workflow branch from df07fbf to d1e51af Compare April 1, 2026 23:56
@wangauone wangauone force-pushed the feat/autoctx-hillclimb-workflow branch from 7b0d976 to fdd2735 Compare April 2, 2026 19:06
@wangauone wangauone force-pushed the feat/autoctx-hillclimb-workflow branch from fdd2735 to 2dcd77e Compare April 2, 2026 19:08
@wangauone wangauone merged commit 2236964 into feat/hill-climbing Apr 2, 2026
2 checks passed
wangauone added a commit that referenced this pull request Apr 21, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.5.0](v0.4.3...v0.5.0)
(2026-04-21)


### Features

* **autoctx:** introduce automated context generation
([#95](#95))
([57f8aa7](57f8aa7))
* **bootstrap:** enrich workflow with user-provided docs and code
([#70](#70))
([04c8455](04c8455))
* **bootstrap:** enrich workflow with user-provided docs and code
([#72](#72))
([bd04743](bd04743))
* **evaluate:** add custom runner configs to lower evalbench parallelism
([#65](#65))
([f0758fa](f0758fa))
* **evaluate:** populate gcp_project_id in llmrater config
([#90](#90))
([016af70](016af70))
* **evaluate:** simplify golden dataset format and write configs dire…
([#68](#68))
([a76b4bb](a76b4bb))
* **evaluate:** support llm rater configuration in evalbench generator
([#64](#64))
([d181b82](d181b82))
* **evaluate:** unify failure reporting and add execution errors
([#80](#80))
([c11ff47](c11ff47))
* **evaluate:** use gemini-2.5-pro for evaluation rater
([#67](#67))
([5c8f92d](5c8f92d))
* **facet:** enforce qualified table names in prompts
([#85](#85))
([b37480f](b37480f))
* **hillclimb:** support eval result reading tool
([#69](#69))
([bb5a249](bb5a249))
* **mcp:** add autoctx-hillclimb workflow skill and tools
([#59](#59))
([2236964](2236964))
* **mcp:** adopt ADC support for Cloud SQL and AlloyDB
([#78](#78))
([f4091fa](f4091fa))
* **mcp:** centralize Gemini model configuration
([#73](#73))
([97c5c25](97c5c25))
* **mcp:** migrate autoctx infrastructure to autoctx/ folder
([#92](#92))
([0b4f52d](0b4f52d))
* **mcp:** switch default model to gemini-2.5-flash
([#93](#93))
([60800e5](60800e5))
* **mcp:** update evalbench version to 1.4.0
([#83](#83))
([0b4ffb3](0b4ffb3))
* **skills:** update bootstrap skill to include upload URL
([#77](#77))
([fd65bff](fd65bff))


### Miscellaneous Chores

* force release version 0.5.0
([da04e7c](da04e7c))
* force release version 0.5.0
([#97](#97))
([82364d6](82364d6))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants