-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
CUDA graph: Necessary to reduce the impact of all sorts of latencies.
Multi tensor apply: reduce kernel launches and saves latency. also more bandwidth optimal.
Describe the solution you'd like
All optimizer should support "capturable" argument as native PyTorch does, e.g. https://docs.pytorch.org/docs/stable/generated/torch.optim.adam.Adam_class.html#adam
constants (betas for example), step counter, everything must be on GPU to be CUDA graph capturable.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request