auto-claude

Autonomous research harness for Claude Code. Give an agent a problem, a file to edit, and a way to measure, then let it run.

Inspired by Karpathy's autoresearch. Same idea, generalized beyond ML training: the agent modifies code, runs an experiment, checks if the result improved, keeps or discards, and repeats. You come back to a log of experiments and (hopefully) a better solution.

Examples

LRU cache optimization -- Agent started from a naive LRU cache and beat mnemonist (the performance-focused LRU library) by 26% in 25 experiments. Full results.
Source map codec -- Agent optimized a VLQ source map encoder/decoder, then we combined its findings with techniques from @jridgewell/sourcemap-codec. Result: 12-23% faster than jridgewell on real Next.js/Babel/chart.js source maps. Full results.
Glob matching -- Agent replaced regex-based matching with hand-rolled string operations. 3.7x faster than picomatch, feature-complete, faster on every real-world pattern tested.
JSON serialization -- Agent optimized a schema-based serializer with codegen, escape lookup tables, and allocation-free regex scanning. 53% faster than native JSON.stringify, 8% faster than fast-json-stringify on real GitHub/JSONPlaceholder API data.
Perf optimization with hardware counters -- Template for optimizing hot paths using perf stat.

How it works

Three things matter:

program.md -- instructions for the agent. What to optimize, how to measure, what files to edit. Written by the human.
The editable file(s) -- whatever the agent is iterating on. Could be anything: a hash table, a compiler pass, a prompt template, a config file.
results.tsv -- log of every experiment. The agent appends to this after each run.

You point Claude at program.md and let it go:

claude "read program.md and start experimenting"

The loop

LOOP FOREVER:
1. Read the current state
2. Come up with an idea (one focused change)
3. Edit the code
4. git commit
5. Run the experiment
6. Record results in results.tsv
7. If improved: keep the commit, advance
8. If worse: git reset back
9. Repeat

The agent runs until you stop it. If each experiment takes ~2 minutes, that's ~30/hour, ~240 overnight.

Writing a program.md

A program.md needs:

What to optimize -- the target, the metric, what "better" means
What to edit -- which files are fair game, which are frozen
How to measure -- the exact command to run and how to extract the result
Constraints -- what the agent cannot do (break the API, add dependencies, etc.)

See templates/program.md.template for the skeleton, or the examples above for concrete programs.

Install as a Claude Code skill

curl -fsSL https://raw.githubusercontent.com/joshuaisaact/auto-claude/main/install.sh | bash

Or manually:

git clone https://github.com/joshuaisaact/auto-claude.git ~/.claude/skills/autoresearch

Then in any project, use /autoresearch to set up an experiment. Claude will walk you through picking a target, metric, and constraints, then start the loop.

Or use it without installing — just copy templates/program.md.template into your project, fill it in, and run:

claude "read program.md and start experimenting"

Design choices

Single metric. The agent needs one number to optimize. If you have multiple metrics, define a composite or pick the most important one.
Fixed budget per experiment. Each run should take roughly the same time regardless of what the agent changes. This makes results comparable.
Commit everything. Including failures. The git log IS the experiment log.
Keep or discard immediately. Don't carry regressions forward hoping the next change will fix them.
Never stop. The agent runs autonomously until interrupted. If it runs out of ideas, it should think harder, not ask.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
examples		examples
templates		templates
README.md		README.md
SKILL.md		SKILL.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

auto-claude

Examples

How it works

The loop

Writing a program.md

Install as a Claude Code skill

Design choices

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

auto-claude

Examples

How it works

The loop

Writing a program.md

Install as a Claude Code skill

Design choices

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages