Eval: Code placement accuracy with domain graph vs baseline

## Hypothesis

An agent using domain classification will place new code in architecturally appropriate locations more often than an agent without domain context.

## The Problem

When agents add new code (functions, classes, files), they often:
- Put code in the wrong module/directory
- Create new files when they should extend existing ones
- Violate architectural boundaries (e.g., UI code in data layer)
- Miss existing patterns (e.g., there's already a `utils/` folder)

This happens because agents lack understanding of the codebase's architectural organization.

## Rationale

**Baseline approach:**
- Agent greps for similar code
- Guesses location based on file names
- May find one example but miss the pattern
- No understanding of domain boundaries

**Domain graph approach:**
- Agent sees architectural domains (e.g., "Authentication", "Billing", "API")
- Knows which files/functions belong to each domain
- Can ask: "Where does authentication code live?"
- Understands existing organizational patterns

## Proposed Eval

### Task Types
1. **"Add a new utility function for X"** - Should go in existing utils, not new file
2. **"Add a new API endpoint for Y"** - Should follow existing API patterns
3. **"Add validation for Z"** - Should go in validation layer, not scattered
4. **"Add logging for W"** - Should use existing logging patterns

### Ground Truth
- Human expert labels the "correct" location for each task
- Or: Use real PRs where reviewers requested code be moved

### Metrics
- **Placement accuracy**: Did code end up in the right module/directory?
- **Pattern adherence**: Did it follow existing conventions?
- **Boundary violations**: Did it cross architectural boundaries?
- **File proliferation**: Did it create unnecessary new files?

### Agents
1. **Baseline**: Standard tools (grep, read, glob)
2. **MCP**: Baseline + `get_domain_graph`

## Example Scenario

Task: "Add a function to validate email addresses"

**Baseline agent might:**
- Create `src/emailValidator.ts` (new file, wrong location)
- Or put it in `src/api/users.ts` (wrong layer)

**Domain-aware agent should:**
- See domain graph shows `src/validation/` exists with other validators
- See `StringValidator`, `PhoneValidator` already there
- Add `EmailValidator` to `src/validation/` following the pattern

## Success Criteria

MCP agent should show:
- Higher placement accuracy (code in right module)
- Fewer boundary violations
- Better pattern adherence
- Fewer unnecessary new files

## Dataset Ideas

1. **Synthetic**: Take well-organized repos, create tasks that have clear "right" answers
2. **Historical**: Find PRs where code was moved during review (original = wrong, final = right)
3. **Expert-labeled**: Have developers label correct locations for hypothetical additions

## Related

- #83 - Focus on call graph + classification tools
- Domain graph provides the architectural context needed for this eval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval: Code placement accuracy with domain graph vs baseline #86

Hypothesis

The Problem

Rationale

Proposed Eval

Task Types

Ground Truth

Metrics

Agents

Example Scenario

Success Criteria

Dataset Ideas

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval: Code placement accuracy with domain graph vs baseline #86

Description

Hypothesis

The Problem

Rationale

Proposed Eval

Task Types

Ground Truth

Metrics

Agents

Example Scenario

Success Criteria

Dataset Ideas

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions