Autonomous vision research based on Karpathy's original autoresearch repo for LLM pretraining, adapted here to CIFAR-100 image classification on a single NVIDIA GTX 1080 Ti.
The repo is intentionally tiny. An agent edits one file, trains for a fixed 5-minute budget, checks whether validation top-1 improved on CIFAR-100, and keeps or discards the experiment.
This snapshot is rendered from the local results.tsv, excludes the Axis-Orbit exploration runs, and labels only the runs that improved the best-so-far result.
Only four files matter for day-to-day autonomous runs:
prepare.py- fixed data prep, cached tensors, dataloaders, and evaluation. Do not modify during autonomous runs.train.py- the single file the agent edits. Model architecture, optimizer, augmentations, and training loop all live here.program.md- the human-written instructions that define the autonomous research loop.run.md- the human-controlled stop flag.Trueallows the next run to start,Falsestops after the current run is logged.
By design, training runs for a fixed 300 second wall-clock budget, regardless of what the agent changes. The metric is validation top-1 accuracy on the full CIFAR-100 validation split, so higher is better.
Requirements:
- NVIDIA GPU with CUDA support
- Python 3.11+
uv
On this Windows machine the default python is 3.8, so prefer explicit 3.11 commands:
# install uv if needed
py -3.11 -m pip install uv
# install dependencies with Python 3.11
uv sync --python 3.11
# download and cache CIFAR-100
uv run --python 3.11 prepare.py
# run one baseline experiment
uv run --python 3.11 train.pyIf those commands work, the repo is ready for autonomous experimentation.
prepare.py fixed data prep, loaders, evaluation
train.py editable model and training loop
program.md autonomous research instructions
run.md stop-after-current-run control file
render_results_graph.py snapshot chart helper
pyproject.toml dependencies
assets/ checked-in README chart
- Single editable file. The agent only touches
train.py. - Fixed time budget. Runs optimize for what can be learned in 5 minutes on this exact GPU.
- Fixed evaluation harness.
prepare.pyowns the validation metric and cached dataset. - Minimal dependencies. Only PyTorch and torchvision are required.
This fork keeps the core autoresearch structure from the LLM version: one mutable training file, a fixed preparation and evaluation harness, and a fixed time budget. The main swap is the task itself, from language-model pretraining to CIFAR-100 image classification.
run.md must contain exactly one control value on its first non-empty line:
Truemeans the agent may start another run.Falsemeans the agent should finish the current run, log it, and then stop before starting the next one.
If run.md is missing or contains anything else, the agent should fail closed and stop instead of guessing.
To regenerate the chart from the current local results.tsv:
.venv\Scripts\python.exe render_results_graph.pyThis fork is scoped to a single CUDA GPU and tuned for the local GTX 1080 Ti. If you move to a different GPU, the first knobs to revisit in train.py are:
DEVICE_BATCH_SIZETOTAL_BATCH_SIZEEMBED_DIM,DEPTH, andNUM_HEADS- learning rate and weight decay
The first version is ViT-first. train.py includes a MODEL_FAMILY switch so CNN baselines can be added later without changing the repo layout.
