After performing static and dynamic quantization to int8, the inference speed became slower rather than faster

### Checks

- [x] This template is only for usage issues encountered.
- [x] I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
- [x] I have searched for existing issues, including closed ones, and couldn't find a solution.
- [x] I am using English to submit this issue to facilitate community communication.

### Environment Details

Ubuntu 22.04.5 LTS
Python 3.10.15
torch 2.5.0a0+872d972e41.nv24.8
onnxruntime-gpu 1.23.0
onnx 1.19.0

### Steps to Reproduce

1.Create a new Conda environment.
2.Import the F5-TTS project.
3.Export the transformer blocks from F5-TTS to ONNX format.
4.Use onnxruntime.quant_pre_process to infer input shapes and obtain pre_onnx as the input model for quantization.
5.Perform static quantization with the following settings:
- Quantized ops: MatMul, Conv
- per_channel=True
- extra_options={ "ActivationSymmetric": True, "WeightSymmetric": True }
- Use the Aishell dataset (speaker S0002) as the calibration set, keeping all other parameters as default.
6.Compare the inference speed before and after quantization — the quantized ONNX model runs slower than the original FP32 model.

### ✔️ Expected Behavior

_No response_

### ❌ Actual Behavior

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

After performing static and dynamic quantization to int8, the inference speed became slower rather than faster #1203

Checks

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

After performing static and dynamic quantization to int8, the inference speed became slower rather than faster #1203

Description

Checks

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions