Skip to content

Eval: Code placement accuracy with domain graph vs baseline #86

@jonathanpopham

Description

@jonathanpopham

Hypothesis

An agent using domain classification will place new code in architecturally appropriate locations more often than an agent without domain context.

The Problem

When agents add new code (functions, classes, files), they often:

  • Put code in the wrong module/directory
  • Create new files when they should extend existing ones
  • Violate architectural boundaries (e.g., UI code in data layer)
  • Miss existing patterns (e.g., there's already a utils/ folder)

This happens because agents lack understanding of the codebase's architectural organization.

Rationale

Baseline approach:

  • Agent greps for similar code
  • Guesses location based on file names
  • May find one example but miss the pattern
  • No understanding of domain boundaries

Domain graph approach:

  • Agent sees architectural domains (e.g., "Authentication", "Billing", "API")
  • Knows which files/functions belong to each domain
  • Can ask: "Where does authentication code live?"
  • Understands existing organizational patterns

Proposed Eval

Task Types

  1. "Add a new utility function for X" - Should go in existing utils, not new file
  2. "Add a new API endpoint for Y" - Should follow existing API patterns
  3. "Add validation for Z" - Should go in validation layer, not scattered
  4. "Add logging for W" - Should use existing logging patterns

Ground Truth

  • Human expert labels the "correct" location for each task
  • Or: Use real PRs where reviewers requested code be moved

Metrics

  • Placement accuracy: Did code end up in the right module/directory?
  • Pattern adherence: Did it follow existing conventions?
  • Boundary violations: Did it cross architectural boundaries?
  • File proliferation: Did it create unnecessary new files?

Agents

  1. Baseline: Standard tools (grep, read, glob)
  2. MCP: Baseline + get_domain_graph

Example Scenario

Task: "Add a function to validate email addresses"

Baseline agent might:

  • Create src/emailValidator.ts (new file, wrong location)
  • Or put it in src/api/users.ts (wrong layer)

Domain-aware agent should:

  • See domain graph shows src/validation/ exists with other validators
  • See StringValidator, PhoneValidator already there
  • Add EmailValidator to src/validation/ following the pattern

Success Criteria

MCP agent should show:

  • Higher placement accuracy (code in right module)
  • Fewer boundary violations
  • Better pattern adherence
  • Fewer unnecessary new files

Dataset Ideas

  1. Synthetic: Take well-organized repos, create tasks that have clear "right" answers
  2. Historical: Find PRs where code was moved during review (original = wrong, final = right)
  3. Expert-labeled: Have developers label correct locations for hypothetical additions

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions