On an RTX 5090, it takes a full 13 minutes to generate a 25-second audio clip with Step-Audio-EditX TTS.
Is this normal? It feels insanely slow compared to what I expected from a 5090.
For reference: 25 seconds of audio → 13 minutes of generation time (RTF ≈ 31x).
On an RTX 5090, it takes a full 13 minutes to generate a 25-second audio clip with Step-Audio-EditX TTS.
Is this normal? It feels insanely slow compared to what I expected from a 5090.
For reference: 25 seconds of audio → 13 minutes of generation time (RTF ≈ 31x).