perf(MoE): Use TE quant/dequant for SwiGLU fp8 input store to improve performance and stability by xiaoxi-wangfj · Pull Request #1753 · NVIDIA/Megatron-LM

xiaoxi-wangfj · 2025-08-19T03:34:57Z

Description

Replace native .to(fp8) casting in SwiGLU with Transformer Engine quant/dequant interfaces for storing activation inputs in FP8.

Benefits:

Higher performance – For quant+dequant operator performance, the dynamic quantization and dequantization method in transformer_engineprovides a 1.48x to 1.66x speedup compared to the native method.
Better numerical stability – The dynamic quantization and dequantization method in transformer_engine handles extreme values more gracefully. Native .to(fp8) can cause underflow of very small numbers to 0, or overflow of large values to inf, while TE scaling reduces these issues.

copy-pr-bot · 2025-08-19T03:35:02Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

megatron/core/fusions/fused_bias_swiglu.py

ksivaman

LGTM, except a small nit to not import internal API as the exact structure of the files might change or move around in TE

… performance and stability Co-authored-by: xiaoxi-wangfj <690912414@qq.com> Co-authored-by: pumpkinsm <123sssmmm@gmail.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

xiaoxi-wangfj · 2025-12-31T02:53:36Z

@ksivaman Thanks for the review! I’ve addressed all the comments and updated the PR. Could you please re-review when you get a chance?

Phlip79 · 2026-03-04T23:05:32Z

We are changing our review process and marking all open, unlabeled PRs as draft. This change will go in effect starting once #3659 is merged.

Moving forward, all PRs will be required to start as draft PRs. If you wish to get your PR merged, mark your PR as “Ready for review”. Read more about the new process at submit.md.

sbhavani added enhancement New feature or request module: transformer engine labels Oct 8, 2025

ksivaman reviewed Oct 8, 2025

View reviewed changes

megatron/core/fusions/fused_bias_swiglu.py Outdated Show resolved Hide resolved

ksivaman approved these changes Oct 8, 2025

View reviewed changes

xiaoxi-wangfj requested review from a team as code owners December 31, 2025 01:45

perf(MoE): Use TE quant/dequant for SwiGLU fp8 input store to improve…

cd7f9fe

… performance and stability Co-authored-by: xiaoxi-wangfj <690912414@qq.com> Co-authored-by: pumpkinsm <123sssmmm@gmail.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

xiaoxi-wangfj force-pushed the optimize-swiglu-input-fp8-quant branch from ccdbe18 to cd7f9fe Compare December 31, 2025 02:07

Merge branch 'main' into optimize-swiglu-input-fp8-quant

b1bec60

github-actions bot added the community-request label Dec 31, 2025

ksivaman approved these changes Dec 31, 2025

View reviewed changes

chtruong814 added the needs-follow-up Issue needs follow-up label Jan 11, 2026

Merge branch 'main' into optimize-swiglu-input-fp8-quant

95515b7

Phlip79 marked this pull request as draft March 4, 2026 23:05

Phlip79 removed the needs-follow-up Issue needs follow-up label Mar 4, 2026

chtruong814 added the needs-follow-up Issue needs follow-up label Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(MoE): Use TE quant/dequant for SwiGLU fp8 input store to improve performance and stability#1753

perf(MoE): Use TE quant/dequant for SwiGLU fp8 input store to improve performance and stability#1753
xiaoxi-wangfj wants to merge 3 commits intoNVIDIA:mainfrom
021ai:optimize-swiglu-input-fp8-quant

xiaoxi-wangfj commented Aug 19, 2025

Uh oh!

copy-pr-bot bot commented Aug 19, 2025

Uh oh!

Uh oh!

ksivaman left a comment

Uh oh!

xiaoxi-wangfj commented Dec 31, 2025

Uh oh!

Phlip79 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

xiaoxi-wangfj commented Aug 19, 2025

Description

Uh oh!

copy-pr-bot bot commented Aug 19, 2025

Uh oh!

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

xiaoxi-wangfj commented Dec 31, 2025

Uh oh!

Phlip79 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants