-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Description
Checks
- This template is only for usage issues encountered.
- I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
- I have searched for existing issues, including closed ones, and couldn't find a solution.
- I am using English to submit this issue to facilitate community communication.
Environment Details
Ubuntu 22.04.5 LTS
Python 3.10.15
torch 2.5.0a0+872d972e41.nv24.8
onnxruntime-gpu 1.23.0
onnx 1.19.0
Steps to Reproduce
1.Create a new Conda environment.
2.Import the F5-TTS project.
3.Export the transformer blocks from F5-TTS to ONNX format.
4.Use onnxruntime.quant_pre_process to infer input shapes and obtain pre_onnx as the input model for quantization.
5.Perform static quantization with the following settings:
- Quantized ops: MatMul, Conv
- per_channel=True
- extra_options={ "ActivationSymmetric": True, "WeightSymmetric": True }
- Use the Aishell dataset (speaker S0002) as the calibration set, keeping all other parameters as default.
6.Compare the inference speed before and after quantization — the quantized ONNX model runs slower than the original FP32 model.
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed