Skip to content

Conversation

@anujj
Copy link
Contributor

@anujj anujj commented Dec 19, 2025

Add QMoE and BF16 support for TRT-RTX execution provider

  • Enable blockwise quantization for TRT-RTX/NvTensorRtRtx EPs
  • Add gpt_oss_swiglu_fusion option for separate gate/up weights
  • Add int4_qdq_block_size for MatMul quantization block size
  • Add BF16 precision support for TRT-RTX
  • Keep padding in QMoE weights for proper alignment

@anujj anujj marked this pull request as draft December 19, 2025 13:35
@anujj
Copy link
Contributor Author

anujj commented Jan 6, 2026

@kunal-vaishnavi @baijumeswani for review

@anujj anujj marked this pull request as ready for review January 6, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant