feat: context optimization pipeline with capability routing by trek-e · Pull Request #3466 · gsd-build/gsd-2

trek-e · 2026-04-03T20:00:56Z

Summary

Implements a comprehensive context optimization pipeline for GSD auto-mode, targeting both token cost reduction and context drift prevention:

ADR-004 Phase 2: Capability-aware model routing — 7-dimension model profiles (coding, debugging, research, reasoning, speed, longContext, instruction) with weighted scoring for task-appropriate model selection. 9 models profiled, 11 unit types mapped to requirement vectors.
Observation masking — Replaces tool result content older than N turns with placeholders before sending to the LLM. Zero LLM overhead, configurable via context_management.observation_masking and observation_mask_turns preferences.
Tool result truncation — Caps individual tool result content at a configurable character limit (context_management.tool_result_max_chars, default 800) during auto-mode sessions.
Phase handoff anchors — Structured JSON summaries (intent, decisions, blockers, nextSteps) written between auto-mode phases. Prompt builders inject these so downstream agents inherit prior phase context without re-inference.
ContextManagementConfig preferences — New preference block with validation for all context management knobs.

Changes by area

Area	Files	What
Model routing	`model-router.ts`, `auto-model-selection.ts`, `complexity-classifier.ts`	Capability profiles, scoring, task requirement vectors
Context masking	`context-masker.ts`, `register-hooks.ts`	Observation masker + tool truncation in `before_provider_request`
Phase anchors	`phase-anchor.ts`, `auto-prompts.ts`, `auto/phases.ts`, `prompts/execute-task.md`	Write anchors after phase completion, inject into prompt builders
Preferences	`preferences-types.ts`, `preferences-validation.ts`, `docs/preferences-reference.md`	ContextManagementConfig, capability_routing flag
ADR	`ADR-004-capability-aware-model-routing.md`	Status → Implemented (Phase 2)
Docs	`pi-context-optimization-opportunities.md`	Pi-layer research (not implemented, reference only)

Closes #3171, #3406, #3452, #3433

Test plan

24 model-router tests pass (including 8 new capability scoring tests)
6 context-masker tests pass
4 phase-anchor tests pass
3 auto-model-selection tests pass (no regressions)
TypeScript type check: 0 new errors (3 pre-existing)
Manual: verify observation masking reduces token counts in verbose auto-mode session
Manual: verify phase anchors appear in execute-task prompts after plan-slice completes

🤖 Generated with Claude Code

…pi-layer research - Spec: 6-change design for GSD extension context optimization - Plan: 9-task TDD implementation plan with exact file paths and code - Pi-layer doc: 10 infrastructure opportunities (research only, not planned) Part of #3171, #3406, #3452, #3433. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Introduces PhaseAnchor read/write utilities so downstream agents can inherit decisions, blockers, and intent written at phase boundaries without re-inferring from conversation history. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ent preferences Implement ADR-004 Phase 2 capability scoring with 7-dimension model profiles, task requirement vectors, and weighted scoring. Add ContextManagementConfig preferences for observation masking thresholds. Wire capability scoring into auto-model-selection dispatch path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…cation Register observation masker in before_provider_request hook to replace old tool results with placeholders during auto-mode. Add tool result truncation (configurable via context_management.tool_result_max_chars). Inject phase handoff anchors into prompt builders so downstream phases inherit decisions from research/planning. Write anchors after successful phase completion. Update ADR-004 status to Implemented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-03T20:01:15Z

🔴 PR Risk Report — CRITICAL


Files changed	22
Systems affected	4
Overall risk	🔴 CRITICAL

Affected Systems

Risk	System
🔴 critical	Auto Engine
🟠 high	GSD Workflow
🟡 medium	Model System
🟢 low	Loader/Bootstrap

File Breakdown

Risk	File	Systems
🔴	`src/resources/extensions/gsd/auto-model-selection.ts`	Auto Engine, Model System
🔴	`src/resources/extensions/gsd/auto/phases.ts`	Auto Engine
🟠	`src/resources/extensions/gsd/bootstrap/register-hooks.ts`	GSD Workflow, Loader/Bootstrap
🟠	`src/resources/extensions/gsd/captures.ts`	GSD Workflow
🟡	`src/resources/extensions/gsd/model-router.ts`	Model System
🟠	`src/resources/extensions/gsd/prompts/execute-task.md`	GSD Workflow
🟠	`src/resources/extensions/gsd/triage-ui.ts`	GSD Workflow
⚪	`docs/ADR-004-capability-aware-model-routing.md`	(unclassified)
⚪	`docs/configuration.md`	(unclassified)
⚪	`docs/dynamic-model-routing.md`	(unclassified)
⚪	`docs/pi-context-optimization-opportunities.md`	(unclassified)
⚪	`docs/token-optimization.md`	(unclassified)
⚪	`src/resources/extensions/gsd/auto-prompts.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/complexity-classifier.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/context-masker.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/docs/preferences-reference.md`	(unclassified)
⚪	`src/resources/extensions/gsd/phase-anchor.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/preferences-types.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/preferences-validation.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/context-masker.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/model-router.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/phase-anchor.test.ts`	(unclassified)

⚠️ Critical risk — please verify: state persistence, auth token lifecycle, agent loop race conditions, RPC protocol compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ment Update dynamic-model-routing.md with capability-aware scoring section. Update token-optimization.md with observation masking, tool truncation, and phase handoff anchor documentation. Update configuration.md with context_management preference block and capability_routing flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jeremymcs · 2026-04-03T20:34:31Z

Follow-up audit (validated against PR branch + current runtime contracts):

I reviewed the diff and validated each point against source and docs. I found 5 meaningful issues:

[HIGH] Slice anchors collide across slices in the same milestone
- Anchors are keyed by phase only (anchors/<phase>.json) and read by (milestoneId, phase) only.
- research-slice / plan-slice anchors from one slice can overwrite another slice in the same milestone.
- Affects handoff context injection in buildPlanSlicePrompt and buildExecuteTaskPrompt.
[HIGH] Observation masking/tool truncation currently target the wrong payload shape
- Hook mutates payload.messages and checks msg.type + string content.
- Real provider payloads vary (openai-responses uses input, Anthropic uses block arrays in messages, etc.), so this path is effectively a no-op for many/most requests.
- Associated tests mirror the same simplified shape, so they don’t catch this mismatch.
[MEDIUM] context_management isn’t fully plumbed in preference merge/known-key paths
- Added in type + validation, but missing from KNOWN_PREFERENCE_KEYS and mergePreferences().
- This can cause warnings (unknown preference key) and dropping the setting in layered preference resolution.
[MEDIUM] compaction_threshold_percent appears as dead config
- Documented and validated, but I couldn’t find runtime consumption in compaction settings.
- User can set it, but compaction behavior won’t change.
[MEDIUM] Phase anchors currently don’t carry the structured handoff data docs claim
- Writer currently stores empty decisions, blockers, nextSteps arrays.
- Docs say anchors transfer those fields to downstream phases, but current implementation doesn’t populate them.

- Fix slice anchor collisions: key anchors by (phase, sliceId) so research-slice/plan-slice anchors from different slices no longer overwrite each other within the same milestone. - Fix payload shape mismatch: context-masker and tool result truncation now handle both internal message format (type field) and provider API formats (role=tool, content arrays with tool_result blocks). - Plumb context_management into KNOWN_PREFERENCE_KEYS and mergePreferences() so the config is properly recognized and merged. - Remove dead compaction_threshold_percent config that was validated and documented but never read at runtime. - Populate structured handoff data in phase anchors by extracting decisions, blockers, and next steps from the artifact files produced by each phase. https://claude.ai/code/session_012ysgpj3kKCNcZdEL7W5eRe

… state corruption - Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users don't get spurious unknown-key warnings when configuring it. - Replace in-place mutation of tool result content with immutable spread to prevent corrupting shared conversation message objects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The Classification type gained stop and backtrack variants from main but triage-ui.ts was not updated, causing a TypeScript build failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…age format The observation masker and tool result truncation in before_provider_request were checking m.type === "toolResult" but the actual pi-ai payload uses m.role === "toolResult" with content as TextContent[] arrays (not strings). bashExecution messages are converted to {role:"user"} by convertToLlm before the hook fires, so checking m.type === "bashExecution" was a no-op. - Fix context-masker to match on role, handle array content, detect bash results by their "Ran `" prefix - Fix register-hooks truncation to operate on role:"toolResult" with array content blocks - Update tests to use correct pi-ai LLM payload format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

trek-e and others added 5 commits April 3, 2026 15:45

feat(context): add observation masking for auto-mode sessions

1272438

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: remove internal planning artifacts from PR

6ea9776

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions bot added enhancement New feature or request performance Performance improvement labels Apr 3, 2026

trek-e and others added 5 commits April 4, 2026 00:23

Merge branch 'main' into feat/gsd-context-optimization

b08756f

resolve merge conflicts with main

020a016

fix: add stop and backtrack to triage-ui classification labels

17eac18

The Classification type gained stop and backtrack variants from main but triage-ui.ts was not updated, causing a TypeScript build failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

trek-e merged commit a7b574a into main Apr 4, 2026
9 checks passed

trek-e mentioned this pull request Apr 4, 2026

revert: undo premature squash merge of #3466 #3489

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: context optimization pipeline with capability routing#3466

feat: context optimization pipeline with capability routing#3466
trek-e merged 12 commits intomainfrom
feat/gsd-context-optimization

trek-e commented Apr 3, 2026

Uh oh!

github-actions bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

jeremymcs commented Apr 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

trek-e commented Apr 3, 2026

Summary

Changes by area

Test plan

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 PR Risk Report — CRITICAL

Affected Systems

Uh oh!

jeremymcs commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 3, 2026 •

edited

Loading

jeremymcs commented Apr 3, 2026 •

edited

Loading