Eval: Dead code detection with call graph vs grep baseline

## Hypothesis

An agent using call graphs and dependency graphs will find dead code more accurately and efficiently than an agent using grep/search alone.

## Rationale

Finding dead code requires answering: "Is this function/class reachable from any entry point?"

**Grep approach:**
- Search for function name as string
- High false positive rate (comments, strings, similar names)
- Misses indirect calls (callbacks, dynamic dispatch)
- Can't trace through call chains
- Agent burns iterations on false leads

**Graph approach:**
- Query call graph for incoming edges to function
- Zero false positives from comments/strings
- Can trace full call chain to entry points
- Dependency graph shows if module is imported anywhere
- Direct answer: "nothing calls this" vs "these 3 functions call this"

## Proposed Eval

### Setup
1. Select repos with known dead code (or inject dead functions into test repos)
2. Ground truth: list of functions that are actually unreachable

### Metrics
- **Precision**: What % of reported dead code is actually dead?
- **Recall**: What % of actual dead code was found?
- **Iterations**: How many tool calls to reach conclusion?
- **False positives**: Functions incorrectly flagged as dead

### Test Cases
- Simple unused function (no callers)
- Function only called by other dead code (transitive dead)
- Function with name that appears in comments but no actual calls
- Function called via callback/dynamic dispatch (tricky for both approaches)

### Agents
1. **Baseline**: Standard tools (grep, read, glob)
2. **MCP**: Baseline + `get_call_graph`, `get_dependency_graph`

## Success Criteria

MCP agent should show:
- Higher precision (fewer false positives)
- Comparable or better recall
- Fewer iterations to reach conclusion

## Related

- #81 - Expose individual graph tools (enables this eval)
- `dead-code-hunter` GitHub Action uses this approach

## Notes

This eval tests a concrete use case where graphs should clearly outperform text search. Unlike SWE-bench (bug fixing), dead code detection fundamentally requires understanding call relationships.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval: Dead code detection with call graph vs grep baseline #85

Hypothesis

Rationale

Proposed Eval

Setup

Metrics

Test Cases

Agents

Success Criteria

Related

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval: Dead code detection with call graph vs grep baseline #85

Description

Hypothesis

Rationale

Proposed Eval

Setup

Metrics

Test Cases

Agents

Success Criteria

Related

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions