Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,30 @@ I think these would be the reasonable hyperparameters to play with. Ask your fav
- [trevin-creator/autoresearch-mlx](https://github.com/trevin-creator/autoresearch-mlx) (MacOS)
- [jsegov/autoresearch-win-rtx](https://github.com/jsegov/autoresearch-win-rtx) (Windows)


## Troubleshooting

**"No CUDA GPU detected"**
- This script requires an NVIDIA GPU with CUDA support
- Mac users: see the MLX forks in Notable forks section
- Windows/Linux: ensure CUDA drivers are installed (`nvidia-smi` should work)

**"CUDA out of memory"**
- Reduce `DEVICE_BATCH_SIZE` in `train.py` (e.g., from 128 to 64)
- For GPUs with <40GB VRAM, also consider reducing `DEPTH`

**"kernels module not found" or Flash Attention errors**
- Ensure you're using `uv run` (not plain `python`)
- Try `uv sync --reinstall` to rebuild dependencies

**Training runs but loss doesn't decrease**
- This is expected for the first ~10 steps (warmup)
- If loss stays flat after step 50+, the experiment may need different hyperparameters

**Script hangs at startup**
- First run compiles the model with `torch.compile`, which can take 1-2 minutes
- Subsequent runs should start faster

## License

MIT