Skip to content

Reduce latency #109

@skyw

Description

@skyw

Is your feature request related to a problem? Please describe.
CUDA graph: Necessary to reduce the impact of all sorts of latencies.
Multi tensor apply: reduce kernel launches and saves latency. also more bandwidth optimal.

Describe the solution you'd like
All optimizer should support "capturable" argument as native PyTorch does, e.g. https://docs.pytorch.org/docs/stable/generated/torch.optim.adam.Adam_class.html#adam

constants (betas for example), step counter, everything must be on GPU to be CUDA graph capturable.

cc @gdengk @FDecaYed @BoxiangW

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions