Skip to content

Conversation

@Mr-Philo
Copy link
Contributor

See Issue #201

This PR create a potential solution to solve the memory leadking issue when using MS-AMP custom GeMM.

Currently the custom GeMM function use ctx object to save input tensor x and weight tensor W. In backward gradient computing, x and W are needed. ctx.input_fp8 means directly saving this attribute. However, input_fp8 is for class ScalingTensor. In practice, this saving method does not fully leverage the advantage of FP8 tensors!

Instead, I suggest using ctx.save_for_backward(). This method is specially designed for better memory management. Change saved context from ScalingTensor to torch.Tensor + ScalingMeta. This is proved to be efficient in memory saving!

Effect for deit-base (86M) model training, batch size 256:

scheme improvement Mem after forward Mem after backward Max mem Throughput One epoch time
FP16 × 18774.96MB 1535.79MB 19242.61MB ~14974.5128 (12708.2790) 02:12
FP8 O2 × 15696.38MB 3964.60MB 19298.19MB ~13673.2756 (11722.0941) 02:15
FP8 O2 GEMM mem optimization 15687.63MB 761.23MB 16245.64MB ~13650.5041 (11671.5361) 02:17

Effect for deit 570M model training, batch size 256:

scheme improvement Mem after forward Mem after backward Max mem Throughput One epoch time
FP16 × 72189.91MB 8977.01MB 72946.86MB ~3379.1181 (3372.2263) 06:26
FP8 O2 × 58077.87MB 15488.33MB 70747.64MB ~3615.4217 (3586.6655) 06:08
FP8 O2 GEMM mem optimization 58079.98MB 3562.89MB 59507.70MB ~3332.3433 (3288.6809) 06:10

@Mr-Philo
Copy link
Contributor Author

@Mr-Philo please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree company="Microsoft"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant