Skip to content

feat: context optimization pipeline with capability routing#3466

Merged
trek-e merged 12 commits intomainfrom
feat/gsd-context-optimization
Apr 4, 2026
Merged

feat: context optimization pipeline with capability routing#3466
trek-e merged 12 commits intomainfrom
feat/gsd-context-optimization

Conversation

@trek-e
Copy link
Copy Markdown
Collaborator

@trek-e trek-e commented Apr 3, 2026

Summary

Implements a comprehensive context optimization pipeline for GSD auto-mode, targeting both token cost reduction and context drift prevention:

  • ADR-004 Phase 2: Capability-aware model routing — 7-dimension model profiles (coding, debugging, research, reasoning, speed, longContext, instruction) with weighted scoring for task-appropriate model selection. 9 models profiled, 11 unit types mapped to requirement vectors.
  • Observation masking — Replaces tool result content older than N turns with placeholders before sending to the LLM. Zero LLM overhead, configurable via context_management.observation_masking and observation_mask_turns preferences.
  • Tool result truncation — Caps individual tool result content at a configurable character limit (context_management.tool_result_max_chars, default 800) during auto-mode sessions.
  • Phase handoff anchors — Structured JSON summaries (intent, decisions, blockers, nextSteps) written between auto-mode phases. Prompt builders inject these so downstream agents inherit prior phase context without re-inference.
  • ContextManagementConfig preferences — New preference block with validation for all context management knobs.

Changes by area

Area Files What
Model routing model-router.ts, auto-model-selection.ts, complexity-classifier.ts Capability profiles, scoring, task requirement vectors
Context masking context-masker.ts, register-hooks.ts Observation masker + tool truncation in before_provider_request
Phase anchors phase-anchor.ts, auto-prompts.ts, auto/phases.ts, prompts/execute-task.md Write anchors after phase completion, inject into prompt builders
Preferences preferences-types.ts, preferences-validation.ts, docs/preferences-reference.md ContextManagementConfig, capability_routing flag
ADR ADR-004-capability-aware-model-routing.md Status → Implemented (Phase 2)
Docs pi-context-optimization-opportunities.md Pi-layer research (not implemented, reference only)

Closes #3171, #3406, #3452, #3433

Test plan

  • 24 model-router tests pass (including 8 new capability scoring tests)
  • 6 context-masker tests pass
  • 4 phase-anchor tests pass
  • 3 auto-model-selection tests pass (no regressions)
  • TypeScript type check: 0 new errors (3 pre-existing)
  • Manual: verify observation masking reduces token counts in verbose auto-mode session
  • Manual: verify phase anchors appear in execute-task prompts after plan-slice completes

🤖 Generated with Claude Code

trek-e and others added 5 commits April 3, 2026 15:45
…pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ent preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

🔴 PR Risk Report — CRITICAL

Files changed 22
Systems affected 4
Overall risk 🔴 CRITICAL

Affected Systems

Risk System
🔴 critical Auto Engine
🟠 high GSD Workflow
🟡 medium Model System
🟢 low Loader/Bootstrap
File Breakdown
Risk File Systems
🔴 src/resources/extensions/gsd/auto-model-selection.ts Auto Engine, Model System
🔴 src/resources/extensions/gsd/auto/phases.ts Auto Engine
🟠 src/resources/extensions/gsd/bootstrap/register-hooks.ts GSD Workflow, Loader/Bootstrap
🟠 src/resources/extensions/gsd/captures.ts GSD Workflow
🟡 src/resources/extensions/gsd/model-router.ts Model System
🟠 src/resources/extensions/gsd/prompts/execute-task.md GSD Workflow
🟠 src/resources/extensions/gsd/triage-ui.ts GSD Workflow
docs/ADR-004-capability-aware-model-routing.md (unclassified)
docs/configuration.md (unclassified)
docs/dynamic-model-routing.md (unclassified)
docs/pi-context-optimization-opportunities.md (unclassified)
docs/token-optimization.md (unclassified)
src/resources/extensions/gsd/auto-prompts.ts (unclassified)
src/resources/extensions/gsd/complexity-classifier.ts (unclassified)
src/resources/extensions/gsd/context-masker.ts (unclassified)
src/resources/extensions/gsd/docs/preferences-reference.md (unclassified)
src/resources/extensions/gsd/phase-anchor.ts (unclassified)
src/resources/extensions/gsd/preferences-types.ts (unclassified)
src/resources/extensions/gsd/preferences-validation.ts (unclassified)
src/resources/extensions/gsd/tests/context-masker.test.ts (unclassified)
src/resources/extensions/gsd/tests/model-router.test.ts (unclassified)
src/resources/extensions/gsd/tests/phase-anchor.test.ts (unclassified)

⚠️ Critical risk — please verify: state persistence, auth token lifecycle, agent loop race conditions, RPC protocol compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added enhancement New feature or request performance Performance improvement labels Apr 3, 2026
…ment

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jeremymcs
Copy link
Copy Markdown
Collaborator

jeremymcs commented Apr 3, 2026

Follow-up audit (validated against PR branch + current runtime contracts):

I reviewed the diff and validated each point against source and docs. I found 5 meaningful issues:

  1. [HIGH] Slice anchors collide across slices in the same milestone

    • Anchors are keyed by phase only (anchors/<phase>.json) and read by (milestoneId, phase) only.
    • research-slice / plan-slice anchors from one slice can overwrite another slice in the same milestone.
    • Affects handoff context injection in buildPlanSlicePrompt and buildExecuteTaskPrompt.
  2. [HIGH] Observation masking/tool truncation currently target the wrong payload shape

    • Hook mutates payload.messages and checks msg.type + string content.
    • Real provider payloads vary (openai-responses uses input, Anthropic uses block arrays in messages, etc.), so this path is effectively a no-op for many/most requests.
    • Associated tests mirror the same simplified shape, so they don’t catch this mismatch.
  3. [MEDIUM] context_management isn’t fully plumbed in preference merge/known-key paths

    • Added in type + validation, but missing from KNOWN_PREFERENCE_KEYS and mergePreferences().
    • This can cause warnings (unknown preference key) and dropping the setting in layered preference resolution.
  4. [MEDIUM] compaction_threshold_percent appears as dead config

    • Documented and validated, but I couldn’t find runtime consumption in compaction settings.
    • User can set it, but compaction behavior won’t change.
  5. [MEDIUM] Phase anchors currently don’t carry the structured handoff data docs claim

    • Writer currently stores empty decisions, blockers, nextSteps arrays.
    • Docs say anchors transfer those fields to downstream phases, but current implementation doesn’t populate them.

trek-e pushed a commit that referenced this pull request Apr 3, 2026
- Fix slice anchor collisions: key anchors by (phase, sliceId) so
  research-slice/plan-slice anchors from different slices no longer
  overwrite each other within the same milestone.
- Fix payload shape mismatch: context-masker and tool result truncation
  now handle both internal message format (type field) and provider API
  formats (role=tool, content arrays with tool_result blocks).
- Plumb context_management into KNOWN_PREFERENCE_KEYS and
  mergePreferences() so the config is properly recognized and merged.
- Remove dead compaction_threshold_percent config that was validated
  and documented but never read at runtime.
- Populate structured handoff data in phase anchors by extracting
  decisions, blockers, and next steps from the artifact files
  produced by each phase.

https://claude.ai/code/session_012ysgpj3kKCNcZdEL7W5eRe
trek-e and others added 5 commits April 4, 2026 00:23
… state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…age format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@trek-e trek-e merged commit a7b574a into main Apr 4, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request performance Performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: token minimization across prompts, tools, agents, and skills

2 participants