Commit b14a3b6
Make FP8 weights compatible with older MCore version (NVIDIA#2342)
* Make cast_master_weights_to_fp8 compatible with older MCore version
Signed-off-by: kunlunl <kunlunl@nvidia.com>
* Rename keep_columnwise to manual_post_all_gather_processing & Optimize unit test
Signed-off-by: kunlunl <kunlunl@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove redundant _test_mini_optimizer()
Signed-off-by: kunlunl <kunlunl@nvidia.com>
---------
Signed-off-by: kunlunl <kunlunl@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>1 parent f3b97c2 commit b14a3b6
File tree
3 files changed
+771
-726
lines changed- tests/pytorch/distributed
- transformer_engine/pytorch/tensor
3 files changed
+771
-726
lines changed
0 commit comments