Trim generated audio based on edge silence #22

akx · 2025-08-06T10:22:35Z

This fixes very short generations ("Hello, world!") being abruptly cut off. The threshold 0.01 was chosen empirically.

SamueleTorregrossa · 2025-08-07T08:45:35Z

Really nice improvement!

horatio-sans-serif · 2025-08-07T14:30:30Z

This works a bit better IMHO. I am swamped at work so it's kinda raw atm:

diff --git a/kittentts/onnx_model.py b/kittentts/onnx_model.py
index abc123..def456 100644
--- a/kittentts/onnx_model.py
+++ b/kittentts/onnx_model.py
@@ -100,11 +100,13 @@ class KittenTTS_1_Onnx:
         onnx_inputs = self._prepare_inputs(text, voice, speed)
         
         outputs = self.session.run(None, onnx_inputs)
+        audio = outputs[0]
         
-        # Trim edge silence from audio
-        non_silent = np.abs(audio) >= 0.01
-        if np.any(non_silent):
-            indices = np.where(non_silent)[0]
-            start, end = indices[0], indices[-1]
-            audio = audio[start : end + 1]
+        # Conservative trimming to avoid cutting off the end syllable
+        # The original -10000 was too aggressive and cut off final syllables
+        # This approach keeps more of the ending while still removing most silence
+        audio = audio[5000:-5000]  # Reduced end trim from -10000 to -5000
+        
+        # Add a small amount of padding to ensure no cutoff
+        # This adds ~83ms of silence at 24kHz sample rate
+        audio = np.pad(audio, (0, 2000), 'constant')
 
         return audio

akx · 2025-08-07T14:32:49Z

This works a bit better IMHO. I am swamped at work so it's kinda raw atm:

It'll still always cut off 5000/24000 samples ~= 0.2 seconds from each end, no matter if there happens to be information there? 🤔

Mic92 · 2025-08-31T09:27:42Z

Noticed that with the latest model this seems no longer needed.

connecteev · 2025-09-04T05:03:51Z

@Mic92 what is the latest model and how can I get it?

Mic92 · 2025-09-04T13:03:31Z

Copied from the README:

m = KittenTTS("KittenML/kitten-tts-nano-0.2")

Going to huggingface also should work.

Krystal5222 · 2025-09-05T00:36:04Z

Unsubscribe

…

On Thu., Sep. 4, 2025, 9:04 a.m. Jörg Thalheim ***@***.***> wrote: *Mic92* left a comment (KittenML/KittenTTS#22) <#22 (comment)> Copied from the README: m = KittenTTS("KittenML/kitten-tts-nano-0.2") Going to huggingface also should work. — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A5AYBKENOBP6LLTUXILYPQ33RA2EHAVCNFSM6AAAAACDHQ23JWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTENJTGYYTEMZWGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Trim generated audio based on edge silence

3883bdf

This was referenced Aug 6, 2025

Stops very abruptly #26

Open

Question: Best practices for text padding to improve audio generation endings #12

Open

This was referenced Aug 13, 2025

why the last word of the context is not pronounced ? #66

Closed

合成语音末尾容易吞音 #72

Open

akx mentioned this pull request Aug 26, 2025

Audio output is cut before finishing all sentences #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trim generated audio based on edge silence #22

Trim generated audio based on edge silence #22

Uh oh!

akx commented Aug 6, 2025

Uh oh!

SamueleTorregrossa commented Aug 7, 2025

Uh oh!

horatio-sans-serif commented Aug 7, 2025

Uh oh!

akx commented Aug 7, 2025

Uh oh!

Mic92 commented Aug 31, 2025

Uh oh!

connecteev commented Sep 4, 2025

Uh oh!

Mic92 commented Sep 4, 2025

Uh oh!

Krystal5222 commented Sep 5, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Trim generated audio based on edge silence #22

Are you sure you want to change the base?

Trim generated audio based on edge silence #22

Uh oh!

Conversation

akx commented Aug 6, 2025

Uh oh!

SamueleTorregrossa commented Aug 7, 2025

Uh oh!

horatio-sans-serif commented Aug 7, 2025

Uh oh!

akx commented Aug 7, 2025

Uh oh!

Mic92 commented Aug 31, 2025

Uh oh!

connecteev commented Sep 4, 2025

Uh oh!

Mic92 commented Sep 4, 2025

Uh oh!

Krystal5222 commented Sep 5, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants