Description
Formalize the Trajectory-Guided Agent Development paradigm: a meta-development workflow where execution traces of the copilot agent are fed to a coding agent (Claude Code) to diagnose behavioral mismatches and propose code fixes.
The Paradigm: Trajectory-Guided Agent Development
┌─────────────────────────────────────────────────────────────────┐
│ DEVELOPMENT CYCLE │
│ │
│ 1. CODE 2. RUN 3. OBSERVE 4. COMPARE │
│ ─────────── ────────── ──────────── ──────────── │
│ Modify agent Execute with Capture full Expected vs │
│ code/prompts test query action trace actual behavior│
│ │ │ │ │ │
│ └──────────────┴────────────────┴────────────────┘ │
│ │ │
│ ▼ │
│ 6. ITERATE 5. FIX ◄── CODING AGENT │
│ ─────────── ────────── ──────────── │
│ Test fix, Coding agent Receives: │
│ repeat cycle proposes changes • Action trace │
│ • Expected behavior │
│ • Agent source code │
│ Outputs: │
│ • Root cause analysis │
│ • Proposed code fix │
└─────────────────────────────────────────────────────────────────┘
Key Insight
The action trajectory (sequence of tool calls, inputs, outputs, state changes) serves as both:
- Bug report: Shows what went wrong
- Specification: Implicitly defines expected behavior when annotated with "should have done X"
A coding agent with access to both the trajectory AND the agent's source code can:
- Reason about WHY the agent chose action A instead of expected action B
- Trace the decision back to prompts, tool definitions, or logic
- Propose targeted fixes
Requirements
1. Detailed Transcript Logging
- Full conversation history with timestamps
- Tool calls with inputs/outputs
- System prompts and context at each turn
- Token usage and latency metrics
- Error traces and exceptions
- State snapshots (embryo state, experiment state)
2. Transcript Format
D:\Gently\transcripts\
├── session_<id>\
│ ├── transcript.jsonl # Line-delimited JSON for streaming
│ ├── context_snapshots\ # State at each turn
│ │ ├── turn_001.json
│ │ └── ...
│ ├── tool_calls\ # Detailed tool I/O
│ │ ├── call_001.json
│ │ └── ...
│ └── summary.json # Session metadata
3. Transcript Analysis Tools
- CLI command to replay/inspect transcripts
- Diff tool for comparing expected vs actual behavior
- Filter by tool type, error, embryo, etc.
- Export problematic sequences for debugging
4. Coding Agent Integration
- Prompt template for feeding transcript + codebase to coding agent
- Script to prepare debugging context:
python -m gently.debug --session <id>
- Includes relevant source files based on tool calls in transcript
- Outputs structured bug report or proposed fix
5. Development Workflow (Trajectory-Guided)
Step 1: RUN
$ python -m gently.agent "calibrate embryo 3"
Step 2: OBSERVE (misbehavior)
Agent called move_stage(x=1000, y=2000) but should have called
get_embryo_position(embryo_id=3) first
Step 3: PREPARE DEBUG CONTEXT
$ python -m gently.debug --session abc123 --annotate "should query position first"
Step 4: INVOKE CODING AGENT
Feed to Claude Code:
- Annotated trajectory (what happened + what should have happened)
- Relevant source files (tool definitions, copilot logic)
- System prompts
Step 5: CODING AGENT REASONS
"The agent skipped position lookup because the tool schema doesn't
indicate embryo_id maps to a position. The system prompt should
instruct to always verify current position before movement."
Step 6: APPLY FIX & VERIFY
- Apply proposed prompt/code change
- Replay same query in simulation mode
- Verify correct behavior
Technical Approach
- Add
TranscriptLogger class hooked into copilot
- Log at multiple granularities (summary, detailed, debug)
- Create
gently.debug module for transcript analysis
- Build prompt templates for coding agent debugging
- Integration with simulation mode for replay testing
Use Cases
- Debug why agent chose wrong tool
- Understand context window issues
- Trace tool execution failures
- Identify prompt engineering improvements
- Regression testing after fixes
Key Files
- New:
gently/agent/transcript_logger.py
- New:
gently/debug/__init__.py
- New:
gently/debug/analyzer.py
- New:
gently/debug/prompts/debugging_prompt.md
- Update:
gently/agent/copilot.py (integrate transcript logging)
Description
Formalize the Trajectory-Guided Agent Development paradigm: a meta-development workflow where execution traces of the copilot agent are fed to a coding agent (Claude Code) to diagnose behavioral mismatches and propose code fixes.
The Paradigm: Trajectory-Guided Agent Development
Key Insight
The action trajectory (sequence of tool calls, inputs, outputs, state changes) serves as both:
A coding agent with access to both the trajectory AND the agent's source code can:
Requirements
1. Detailed Transcript Logging
2. Transcript Format
3. Transcript Analysis Tools
4. Coding Agent Integration
python -m gently.debug --session <id>5. Development Workflow (Trajectory-Guided)
Technical Approach
TranscriptLoggerclass hooked into copilotgently.debugmodule for transcript analysisUse Cases
Key Files
gently/agent/transcript_logger.pygently/debug/__init__.pygently/debug/analyzer.pygently/debug/prompts/debugging_prompt.mdgently/agent/copilot.py(integrate transcript logging)