Skip to content

Eval: Dead code detection with call graph vs grep baseline #85

@jonathanpopham

Description

@jonathanpopham

Hypothesis

An agent using call graphs and dependency graphs will find dead code more accurately and efficiently than an agent using grep/search alone.

Rationale

Finding dead code requires answering: "Is this function/class reachable from any entry point?"

Grep approach:

  • Search for function name as string
  • High false positive rate (comments, strings, similar names)
  • Misses indirect calls (callbacks, dynamic dispatch)
  • Can't trace through call chains
  • Agent burns iterations on false leads

Graph approach:

  • Query call graph for incoming edges to function
  • Zero false positives from comments/strings
  • Can trace full call chain to entry points
  • Dependency graph shows if module is imported anywhere
  • Direct answer: "nothing calls this" vs "these 3 functions call this"

Proposed Eval

Setup

  1. Select repos with known dead code (or inject dead functions into test repos)
  2. Ground truth: list of functions that are actually unreachable

Metrics

  • Precision: What % of reported dead code is actually dead?
  • Recall: What % of actual dead code was found?
  • Iterations: How many tool calls to reach conclusion?
  • False positives: Functions incorrectly flagged as dead

Test Cases

  • Simple unused function (no callers)
  • Function only called by other dead code (transitive dead)
  • Function with name that appears in comments but no actual calls
  • Function called via callback/dynamic dispatch (tricky for both approaches)

Agents

  1. Baseline: Standard tools (grep, read, glob)
  2. MCP: Baseline + get_call_graph, get_dependency_graph

Success Criteria

MCP agent should show:

  • Higher precision (fewer false positives)
  • Comparable or better recall
  • Fewer iterations to reach conclusion

Related

Notes

This eval tests a concrete use case where graphs should clearly outperform text search. Unlike SWE-bench (bug fixing), dead code detection fundamentally requires understanding call relationships.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions