Skip to content

Conversation

@akx
Copy link

@akx akx commented Aug 6, 2025

This fixes very short generations ("Hello, world!") being abruptly cut off. The threshold 0.01 was chosen empirically.

@SamueleTorregrossa
Copy link

Really nice improvement!

@horatio-sans-serif
Copy link

This works a bit better IMHO. I am swamped at work so it's kinda raw atm:

diff --git a/kittentts/onnx_model.py b/kittentts/onnx_model.py
index abc123..def456 100644
--- a/kittentts/onnx_model.py
+++ b/kittentts/onnx_model.py
@@ -100,11 +100,13 @@ class KittenTTS_1_Onnx:
         onnx_inputs = self._prepare_inputs(text, voice, speed)
         
         outputs = self.session.run(None, onnx_inputs)
+        audio = outputs[0]
         
-        # Trim edge silence from audio
-        non_silent = np.abs(audio) >= 0.01
-        if np.any(non_silent):
-            indices = np.where(non_silent)[0]
-            start, end = indices[0], indices[-1]
-            audio = audio[start : end + 1]
+        # Conservative trimming to avoid cutting off the end syllable
+        # The original -10000 was too aggressive and cut off final syllables
+        # This approach keeps more of the ending while still removing most silence
+        audio = audio[5000:-5000]  # Reduced end trim from -10000 to -5000
+        
+        # Add a small amount of padding to ensure no cutoff
+        # This adds ~83ms of silence at 24kHz sample rate
+        audio = np.pad(audio, (0, 2000), 'constant')
 
         return audio

@akx
Copy link
Author

akx commented Aug 7, 2025

This works a bit better IMHO. I am swamped at work so it's kinda raw atm:

It'll still always cut off 5000/24000 samples ~= 0.2 seconds from each end, no matter if there happens to be information there? 🤔

@Mic92
Copy link

Mic92 commented Aug 31, 2025

Noticed that with the latest model this seems no longer needed.

@connecteev
Copy link

@Mic92 what is the latest model and how can I get it?

@Mic92
Copy link

Mic92 commented Sep 4, 2025

Copied from the README:

m = KittenTTS("KittenML/kitten-tts-nano-0.2")

Going to huggingface also should work.

@Krystal5222
Copy link

Krystal5222 commented Sep 5, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants