karpathy · tykoo-chen · Mar 11, 2026
diff --git a/README.md b/README.md
@@ -86,6 +86,30 @@ I think these would be the reasonable hyperparameters to play with. Ask your fav
 - [trevin-creator/autoresearch-mlx](https://github.com/trevin-creator/autoresearch-mlx) (MacOS)
 - [jsegov/autoresearch-win-rtx](https://github.com/jsegov/autoresearch-win-rtx) (Windows)
 
+
+## Troubleshooting
+
+**"No CUDA GPU detected"**
+- This script requires an NVIDIA GPU with CUDA support
+- Mac users: see the MLX forks in Notable forks section
+- Windows/Linux: ensure CUDA drivers are installed (`nvidia-smi` should work)
+
+**"CUDA out of memory"**
+- Reduce `DEVICE_BATCH_SIZE` in `train.py` (e.g., from 128 to 64)
+- For GPUs with <40GB VRAM, also consider reducing `DEPTH`
+
+**"kernels module not found" or Flash Attention errors**
+- Ensure you're using `uv run` (not plain `python`)
+- Try `uv sync --reinstall` to rebuild dependencies
+
+**Training runs but loss doesn't decrease**
+- This is expected for the first ~10 steps (warmup)
+- If loss stays flat after step 50+, the experiment may need different hyperparameters
+
+**Script hangs at startup**
+- First run compiles the model with `torch.compile`, which can take 1-2 minutes
+- Subsequent runs should start faster
+
 ## License
 
 MIT