Skip to content

RichiiiTV/autosearch_vit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

autoresearch-vit

Autonomous vision research based on Karpathy's original autoresearch repo for LLM pretraining, adapted here to CIFAR-100 image classification on a single NVIDIA GTX 1080 Ti.

The repo is intentionally tiny. An agent edits one file, trains for a fixed 5-minute budget, checks whether validation top-1 improved on CIFAR-100, and keeps or discards the experiment.

Experiment progress chart

This snapshot is rendered from the local results.tsv, excludes the Axis-Orbit exploration runs, and labels only the runs that improved the best-so-far result.

How it works

Only four files matter for day-to-day autonomous runs:

  • prepare.py - fixed data prep, cached tensors, dataloaders, and evaluation. Do not modify during autonomous runs.
  • train.py - the single file the agent edits. Model architecture, optimizer, augmentations, and training loop all live here.
  • program.md - the human-written instructions that define the autonomous research loop.
  • run.md - the human-controlled stop flag. True allows the next run to start, False stops after the current run is logged.

By design, training runs for a fixed 300 second wall-clock budget, regardless of what the agent changes. The metric is validation top-1 accuracy on the full CIFAR-100 validation split, so higher is better.

Quick start

Requirements:

  • NVIDIA GPU with CUDA support
  • Python 3.11+
  • uv

On this Windows machine the default python is 3.8, so prefer explicit 3.11 commands:

# install uv if needed
py -3.11 -m pip install uv

# install dependencies with Python 3.11
uv sync --python 3.11

# download and cache CIFAR-100
uv run --python 3.11 prepare.py

# run one baseline experiment
uv run --python 3.11 train.py

If those commands work, the repo is ready for autonomous experimentation.

Project structure

prepare.py      fixed data prep, loaders, evaluation
train.py        editable model and training loop
program.md      autonomous research instructions
run.md          stop-after-current-run control file
render_results_graph.py  snapshot chart helper
pyproject.toml  dependencies
assets/         checked-in README chart

Design choices

  • Single editable file. The agent only touches train.py.
  • Fixed time budget. Runs optimize for what can be learned in 5 minutes on this exact GPU.
  • Fixed evaluation harness. prepare.py owns the validation metric and cached dataset.
  • Minimal dependencies. Only PyTorch and torchvision are required.

This fork keeps the core autoresearch structure from the LLM version: one mutable training file, a fixed preparation and evaluation harness, and a fixed time budget. The main swap is the task itself, from language-model pretraining to CIFAR-100 image classification.

Stopping autonomous runs

run.md must contain exactly one control value on its first non-empty line:

  • True means the agent may start another run.
  • False means the agent should finish the current run, log it, and then stop before starting the next one.

If run.md is missing or contains anything else, the agent should fail closed and stop instead of guessing.

To regenerate the chart from the current local results.tsv:

.venv\Scripts\python.exe render_results_graph.py

Notes for smaller or different hardware

This fork is scoped to a single CUDA GPU and tuned for the local GTX 1080 Ti. If you move to a different GPU, the first knobs to revisit in train.py are:

  1. DEVICE_BATCH_SIZE
  2. TOTAL_BATCH_SIZE
  3. EMBED_DIM, DEPTH, and NUM_HEADS
  4. learning rate and weight decay

The first version is ViT-first. train.py includes a MODEL_FAMILY switch so CNN baselines can be added later without changing the repo layout.

About

Autoresearch for ViT architectures

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages