-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
We don't know if the current graph format is optimal for agent consumption. The raw API response might not be the best way to present structural information to an LLM.
Before changing the API, we need to prove which data format actually helps agents most.
Approach
One API call → Multiple transformations → Cached files → Comparative evals
┌─────────────┐
│ API Call │ (one time, expensive)
└──────┬──────┘
│
▼
┌─────────────┐
│ Raw Graph │ (cache as file)
└──────┬──────┘
│
├──► Format A (e.g., nested JSON)
├──► Format B (e.g., flat adjacency list)
├──► Format C (e.g., natural language summary)
├──► Format D (e.g., markdown with code blocks)
└──► Format E (e.g., compressed/filtered view)
│
▼
┌─────────────────────────────────────┐
│ Eval: Baseline vs A vs B vs C ... │
└─────────────────────────────────────┘
Hypotheses to Test
Format Variations
- Raw JSON (current) - Full graph structure as-is
- Adjacency list - Simple
caller → [callees]format - Natural language - "Function X calls Y, Z. Function Y is called by X, W."
- Hierarchical markdown - Organized by domain/file with indentation
- Filtered/minimal - Only nodes relevant to the query, no noise
- Compressed summary - Stats + top-level structure, expandable on demand
- Code-centric - File paths and line numbers prominent, relationships secondary
Presentation Variations
- Injected in system prompt - Graph as persistent context
- Returned from tool call - Agent explicitly requests it
- Hybrid - Summary in system prompt, details via tool
Implementation
Step 1: Agent Format Converter
Build a transformer that takes raw graph and outputs multiple formats:
interface GraphFormatter {
name: string;
transform(raw: SupermodelIR): string;
}
const formatters: GraphFormatter[] = [
{ name: 'raw_json', transform: (g) => JSON.stringify(g) },
{ name: 'adjacency', transform: (g) => toAdjacencyList(g) },
{ name: 'natural_language', transform: (g) => toNaturalLanguage(g) },
{ name: 'markdown', transform: (g) => toMarkdown(g) },
// ...
];Step 2: Cache All Formats
For each benchmark repo, generate and cache all formats:
cache/
django/
raw.json
adjacency.txt
natural_language.txt
markdown.md
...
Step 3: Run Comparative Evals
- Baseline: No graph
- Treatment A: raw.json injected
- Treatment B: adjacency.txt injected
- Treatment C: natural_language.txt injected
- ...
Metrics
- Task completion rate
- Iterations to solution
- Accuracy of code placement
- Token efficiency (some formats are smaller)
Why This Matters
We might have the right data but the wrong presentation.
Possible findings:
- "Natural language summaries outperform raw JSON by 40%"
- "Adjacency lists are 10x smaller and perform the same"
- "Markdown format helps agents reason about structure"
This tells us what to build, not just whether graphs help.
Success Criteria
Identify at least one format that shows statistically significant improvement over:
- Baseline (no graph)
- Raw JSON (current format)
Related
- Support local graph.json file for instant context injection #82 - Local graph.json file (caching mechanism)
- Consider focusing on call graph + classification as primary tools #83 - Focus on call graph + classification
Metadata
Metadata
Assignees
Labels
No labels