Skip to content

[Feature] Trajectory-guided agent development with coding-agent-assisted refinement #10

@pskeshu

Description

@pskeshu

Description

Formalize the Trajectory-Guided Agent Development paradigm: a meta-development workflow where execution traces of the copilot agent are fed to a coding agent (Claude Code) to diagnose behavioral mismatches and propose code fixes.

The Paradigm: Trajectory-Guided Agent Development

┌─────────────────────────────────────────────────────────────────┐
│                    DEVELOPMENT CYCLE                             │
│                                                                  │
│  1. CODE        2. RUN           3. OBSERVE       4. COMPARE     │
│  ───────────    ──────────       ────────────     ────────────   │
│  Modify agent   Execute with     Capture full     Expected vs    │
│  code/prompts   test query       action trace     actual behavior│
│       │              │                │                │         │
│       └──────────────┴────────────────┴────────────────┘         │
│                              │                                   │
│                              ▼                                   │
│  6. ITERATE     5. FIX            ◄── CODING AGENT               │
│  ───────────    ──────────            ────────────               │
│  Test fix,      Coding agent          Receives:                  │
│  repeat cycle   proposes changes      • Action trace             │
│                                       • Expected behavior        │
│                                       • Agent source code        │
│                                       Outputs:                   │
│                                       • Root cause analysis      │
│                                       • Proposed code fix        │
└─────────────────────────────────────────────────────────────────┘

Key Insight

The action trajectory (sequence of tool calls, inputs, outputs, state changes) serves as both:

  • Bug report: Shows what went wrong
  • Specification: Implicitly defines expected behavior when annotated with "should have done X"

A coding agent with access to both the trajectory AND the agent's source code can:

  • Reason about WHY the agent chose action A instead of expected action B
  • Trace the decision back to prompts, tool definitions, or logic
  • Propose targeted fixes

Requirements

1. Detailed Transcript Logging

  • Full conversation history with timestamps
  • Tool calls with inputs/outputs
  • System prompts and context at each turn
  • Token usage and latency metrics
  • Error traces and exceptions
  • State snapshots (embryo state, experiment state)

2. Transcript Format

D:\Gently\transcripts\
├── session_<id>\
│   ├── transcript.jsonl       # Line-delimited JSON for streaming
│   ├── context_snapshots\     # State at each turn
│   │   ├── turn_001.json
│   │   └── ...
│   ├── tool_calls\            # Detailed tool I/O
│   │   ├── call_001.json
│   │   └── ...
│   └── summary.json           # Session metadata

3. Transcript Analysis Tools

  • CLI command to replay/inspect transcripts
  • Diff tool for comparing expected vs actual behavior
  • Filter by tool type, error, embryo, etc.
  • Export problematic sequences for debugging

4. Coding Agent Integration

  • Prompt template for feeding transcript + codebase to coding agent
  • Script to prepare debugging context: python -m gently.debug --session <id>
  • Includes relevant source files based on tool calls in transcript
  • Outputs structured bug report or proposed fix

5. Development Workflow (Trajectory-Guided)

Step 1: RUN
$ python -m gently.agent "calibrate embryo 3"

Step 2: OBSERVE (misbehavior)
Agent called move_stage(x=1000, y=2000) but should have called
get_embryo_position(embryo_id=3) first

Step 3: PREPARE DEBUG CONTEXT
$ python -m gently.debug --session abc123 --annotate "should query position first"

Step 4: INVOKE CODING AGENT
Feed to Claude Code:
- Annotated trajectory (what happened + what should have happened)
- Relevant source files (tool definitions, copilot logic)
- System prompts

Step 5: CODING AGENT REASONS
"The agent skipped position lookup because the tool schema doesn't
indicate embryo_id maps to a position. The system prompt should
instruct to always verify current position before movement."

Step 6: APPLY FIX & VERIFY
- Apply proposed prompt/code change
- Replay same query in simulation mode
- Verify correct behavior

Technical Approach

  • Add TranscriptLogger class hooked into copilot
  • Log at multiple granularities (summary, detailed, debug)
  • Create gently.debug module for transcript analysis
  • Build prompt templates for coding agent debugging
  • Integration with simulation mode for replay testing

Use Cases

  • Debug why agent chose wrong tool
  • Understand context window issues
  • Trace tool execution failures
  • Identify prompt engineering improvements
  • Regression testing after fixes

Key Files

  • New: gently/agent/transcript_logger.py
  • New: gently/debug/__init__.py
  • New: gently/debug/analyzer.py
  • New: gently/debug/prompts/debugging_prompt.md
  • Update: gently/agent/copilot.py (integrate transcript logging)

Metadata

Metadata

Assignees

No one assigned

    Labels

    copilotAI copilot functionalityenhancementNew feature or requestpriority-mediumImportant but not blockingtestingTesting and evaluation

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions