Skip to content

Conversation

@n1ck-guo
Copy link
Contributor

@n1ck-guo n1ck-guo commented Jan 8, 2026

This pull request refactors the _quant_data method in auto_round/export/export_to_gguf/convert.py to improve support for MOE models, streamline attribute handling, and clean up the quantization logic. The changes mainly focus on making the code more robust for different model architectures and removing legacy or redundant quantization branches.

Support for MOE models and quantization logic cleanup:

  • Improved handling for MOE models by updating the attribute check to support modules with "exps" in their name and 3D tensor shapes, making the code more flexible for non-linear exporters.
  • Refactored the quantization logic to remove legacy branches and commented code, simplifying the decision flow for quantization type selection and ensuring FP16 issues are documented but not used.

General code cleanup:

  • Removed an unnecessary suffix check from the beginning of the function, streamlining the code for extracting layer names.

@n1ck-guo n1ck-guo requested review from wenhuach21 and xin3he January 8, 2026 06:54
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo n1ck-guo changed the title add support for moe model with non-linear exports layer for gguf GGUF format add support for MoE models with non-linear expert layers. Jan 8, 2026
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants