Autonomous research harness for Claude Code. Give an agent a problem, a file to edit, and a way to measure, then let it run.
Inspired by Karpathy's autoresearch. Same idea, generalized beyond ML training: the agent modifies code, runs an experiment, checks if the result improved, keeps or discards, and repeats. You come back to a log of experiments and (hopefully) a better solution.
- LRU cache optimization -- Agent started from a naive LRU cache and beat mnemonist (the performance-focused LRU library) by 26% in 25 experiments. Full results.
- Source map codec -- Agent optimized a VLQ source map encoder/decoder, then we combined its findings with techniques from @jridgewell/sourcemap-codec. Result: 12-23% faster than jridgewell on real Next.js/Babel/chart.js source maps. Full results.
- Glob matching -- Agent replaced regex-based matching with hand-rolled string operations. 3.7x faster than picomatch, feature-complete, faster on every real-world pattern tested.
- JSON serialization -- Agent optimized a schema-based serializer with codegen, escape lookup tables, and allocation-free regex scanning. 53% faster than native
JSON.stringify, 8% faster than fast-json-stringify on real GitHub/JSONPlaceholder API data. - Perf optimization with hardware counters -- Template for optimizing hot paths using
perf stat.
Three things matter:
program.md-- instructions for the agent. What to optimize, how to measure, what files to edit. Written by the human.- The editable file(s) -- whatever the agent is iterating on. Could be anything: a hash table, a compiler pass, a prompt template, a config file.
results.tsv-- log of every experiment. The agent appends to this after each run.
You point Claude at program.md and let it go:
claude "read program.md and start experimenting"
LOOP FOREVER:
1. Read the current state
2. Come up with an idea (one focused change)
3. Edit the code
4. git commit
5. Run the experiment
6. Record results in results.tsv
7. If improved: keep the commit, advance
8. If worse: git reset back
9. Repeat
The agent runs until you stop it. If each experiment takes ~2 minutes, that's ~30/hour, ~240 overnight.
A program.md needs:
- What to optimize -- the target, the metric, what "better" means
- What to edit -- which files are fair game, which are frozen
- How to measure -- the exact command to run and how to extract the result
- Constraints -- what the agent cannot do (break the API, add dependencies, etc.)
See templates/program.md.template for the skeleton, or the examples above for concrete programs.
curl -fsSL https://raw.githubusercontent.com/joshuaisaact/auto-claude/main/install.sh | bashOr manually:
git clone https://github.com/joshuaisaact/auto-claude.git ~/.claude/skills/autoresearchThen in any project, use /autoresearch to set up an experiment. Claude will walk you through picking a target, metric, and constraints, then start the loop.
Or use it without installing — just copy templates/program.md.template into your project, fill it in, and run:
claude "read program.md and start experimenting"
- Single metric. The agent needs one number to optimize. If you have multiple metrics, define a composite or pick the most important one.
- Fixed budget per experiment. Each run should take roughly the same time regardless of what the agent changes. This makes results comparable.
- Commit everything. Including failures. The git log IS the experiment log.
- Keep or discard immediately. Don't carry regressions forward hoping the next change will fix them.
- Never stop. The agent runs autonomously until interrupted. If it runs out of ideas, it should think harder, not ask.
MIT