Enhance experiment runner with deterministic controls#193
Enhance experiment runner with deterministic controls#193buzypi wants to merge 1 commit intokarpathy:masterfrom
Conversation
Basic End-to-End Flow (via OpenCode)
opencode
Type this in OpenCode: "Start running the experiment, run 5 loops" It maps to: python workflows/run_experiment.py start --loops 5
Type this in OpenCode: "Run another 5 iterations" It maps to: python workflows/run_experiment.py resume --loops 5
The training runs happen in the background. And the agent goes into a sleep (which it auto-determines). We can interrupt it and ask it questions like: "Show the run status". It maps to: python workflows/run_experiment.py statusWe can ask any other questions like: "Tell me about the results so far" or "What is the GPU usage" etc. We can ask it to resume its work. Human-in-the-Loop OverrideIf you want to inject your own experiment idea instead of stochastic proposal generation: Type this in OpenCode: "In the next iteration, increase the LR by 10% and keep warmup unchanged. Use this as a human proposal and run 1 loop." Useful Stage-Control Examples (Direct)In case you want to go headless you can do these: Setup + baseline only: python workflows/run_experiment.py start --only setup,baselineRun only selected loop internals for 3 iterations: python workflows/run_experiment.py resume --loops 3 --only loop --loop-only train,record,decideForeground mode (disable background training): python workflows/run_experiment.py resume --loops 1 --no-background-trainLogs and ArtifactsRun outputs are written to:
|
Summary
This PR adds a execution workflow for autonomous experiments, replacing session-by-session
program.mdinterpretation with a deterministic runner plus agent runbook.What’s included
workflows/run_experiment.pyas the single experiment orchestrator:start,resume,statuscommandssetup,baseline,looppropose,apply,commit,train,triage,record,decideworkflows/runs/<run_id>/<branch-slug>-rNNNAGENTS.mdrunbook with explicit natural-language to command mapping for agent sessions.uv run prepare.pywhen cache/tokenizer is missing (default on, opt-out via--no-auto-prepare)--background-train)resumepolls/continues in-flight baseline/train jobsworkflows/runs/<run_id>/next_proposal.json--proposal-file <path>onstart/resumeworkflows/schemas/proposal.schema.json--proposal-file->next_proposal.json-> stochastic proposal -> deterministic fallbackworkflows/runs/<run_id>/consumed_proposals/iter_<NNNN>.jsonWhy
In long-running autonomous sessions, prose-only execution is fragile and inconsistent. Multiple complaints regarding this on X.
This PR makes runs repeatable, resumable, inspectable, and human-steerable while preserving
program.mdas the policy/objective layer.