Add early structural triage to kill degenerate experiments at 60s by a-nom-ali · Pull Request #204 · karpathy/autoresearch

a-nom-ali · 2026-03-12T09:17:48Z

Summary

Adds a one-shot structural health check at the 1-minute mark (configurable via TRIAGE_TIME)
Computes effective rank (spectral entropy of weight matrix SVDs) and gradient coherence (cosine similarity of consecutive layer gradients)
Kills the experiment early (exit(1)) if effective rank has collapsed below 50% of its initial value (TRIAGE_KILL threshold)
Reports eff_rank_init, eff_rank_final, and rank_retention in the final summary alongside val_bpb

Motivation

The existing fast-fail check (loss > 100) only catches catastrophic divergence. Effective rank collapse — where the model's weight matrices lose expressivity — is a subtler failure mode that predicts bad final val_bpb but doesn't necessarily spike the loss. Catching it at 60s saves 4 minutes per degenerate hyperparameter configuration.

Implementation

structural_triage(model) — iterates all 2D parameters ≥64 in min dimension, computes SVD, returns mean effective rank and gradient coherence
~50ms one-shot cost at the checkpoint (SVD on ~48 matrices at 512×512)
Zero new dependencies — uses torch.linalg.svdvals and F.cosine_similarity
44 lines added, 0 deleted
Set TRIAGE_TIME = 0 to disable entirely

Test plan

Verify Initial effective rank: X.X prints at startup
Verify [triage@60s] checkpoint fires at ~60s with rank ratio and coherence
Verify eff_rank_init, eff_rank_final, rank_retention appear in final summary
Confirm no performance regression (triage runs once, not per-step)
Test early kill by setting TRIAGE_KILL = 0.99 (should kill immediately)
Test disable by setting TRIAGE_TIME = 0 (no triage output)

🤖 Generated with Claude Code

Computes effective rank (spectral entropy of weight matrix SVDs) and gradient coherence (cosine similarity of consecutive layer gradients) at the 1-minute mark. If effective rank has collapsed below 50% of its initial value, the experiment is killed early instead of running the full 5-minute budget. Two configurable hyperparameters: TRIAGE_TIME (seconds, 0 to disable) and TRIAGE_KILL (fraction threshold). Rank retention is reported in the final summary alongside val_bpb. Zero new dependencies — pure PyTorch (torch.linalg.svdvals, F.cosine_similarity). ~50ms one-shot cost at the checkpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add early structural triage to kill degenerate experiments at 60s#204

Add early structural triage to kill degenerate experiments at 60s#204
a-nom-ali wants to merge 1 commit intokarpathy:masterfrom
a-nom-ali:structural-triage

a-nom-ali commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

a-nom-ali commented Mar 12, 2026

Summary

Motivation

Implementation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant