-
Notifications
You must be signed in to change notification settings - Fork 26
feat: Add torch.compile with CUDA graphs support for ~5x MD speedup #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR adds torch.compile support with CUDA graphs for significant speedups on GPU molecular dynamics simulations. Based on community PR #20 from Acellera, but reworked with improvements: **Changes:** - Add `compile_mode=True` parameter to `AIMNet2Calculator` and `AIMNet2ASE` - Add `compile_nb_mode` parameter throughout to avoid data-dependent control flow that breaks CUDA graph capture - Add `get_model_definition_path()` for mapping model names to YAML definitions - Add `cosine_cutoff_tensor()` for CUDA graphs compatibility - Add `enable_compile_mode()` to AIMNet2Base to propagate compile settings - Add `calc_masks_fixed_nb_mode()` for compile-time mask calculation **Improvements over original PR #20:** - Generalized model loading (not hardcoded to one model) - Backward-compatible (original `cosine_cutoff` signature unchanged) - Comprehensive test coverage - Code style compliance (passes pre-commit) - Based on current main branch **Limitations:** - Only `nb_mode=0` (single molecule, dense) supported - Requires CUDA - No PBC support in compile mode - First call has compilation overhead **Usage:** ```python from aimnet.calculators import AIMNet2Calculator calc = AIMNet2Calculator("aimnet2", compile_mode=True) ```
Pull Request Review: torch.compile with CUDA Graphs SupportSummaryThis PR adds ✅ StrengthsCode Quality
Test Coverage
Performance & Documentation
🔍 Issues & Concerns1. Critical: Potential Device Mismatch in
|
Summary
This PR adds
torch.compilesupport with CUDA graphs for significant speedups on GPU molecular dynamics simulations.Based on community PR #20 from Acellera, but reworked with improvements:
cosine_cutoffsignature unchanged)Changes
compile_mode=Trueparameter toAIMNet2CalculatorandAIMNet2ASEcompile_nb_modeparameter throughout to avoid data-dependent control flow that breaks CUDA graph captureget_model_definition_path()for mapping model names to YAML definitionscosine_cutoff_tensor()for CUDA graphs compatibilityenable_compile_mode()to AIMNet2Base to propagate compile settingscalc_masks_fixed_nb_mode()for compile-time mask calculationPerformance
Based on benchmarks from community PR #20:
Limitations
nb_mode=0(single molecule, dense) supportedUsage
Or with ASE:
Test plan
pytest -m gpu tests/test_compile.py(requires CUDA)python examples/ase_md_compiled.py --compileCloses #20 (supersedes with improvements)