Skip to content

ALSA 'underrun occurred' Causing Audio Overlap in Piper TTS on WSL2 #340

@RoeeWork

Description

@RoeeWork

Hey, im trying to use this repo to stream a hebrew response from an LLM to a TTS engine in real time. Since The machine i'm working on is too slow to run most LLMs, im connected via SSH to my more powerful gaming PC using WSL2. also, I use VoiceMeeter's VBAN functionallity to stream audio from the gaming PC to my workstation.

As the LLM generates its output, I send chunks of 4-5 words at a time to RealtimeTTS. However, when the program starts to play audio there are multiple "underrun occurred" errors like this:

ALSA lib pcm.c:8568:(snd_pcm_recover) underrun occurred

an example:

input:hello there

[respond] starting...
[respond] speech: hello there

[respond] generating response...
['שלום!']
שלום!
מה
אוכל
לעשות
['מה', 'אוכל', 'לעשות', 'בשבילך?']
בשבילך?
⚡ synthesizing → 'שלום! מה אוכל לעשות בשבילך ?'

[respond] done!
SYNTHESIS FINISHED
ALSA lib pcm.c:8568:(snd_pcm_recover) underrun occurred
ALSA lib pcm.c:8568:(snd_pcm_recover) underrun occurred
input:

for each error like this, the correct audio is played once - so if two underruns occur, i hear the same audio twice. when using stream.play(log_synthesized_text=True) it seems like synthesis is happening as expected, yet the audio itself plays multiple times.
I'm not sure what's causing this - maybe a side effect of the way audio is handled in WSL2?
thanks !

my code:

from LLM_utils import respond
from phonikud_tts import Phonikud, phonemize
from RealtimeTTS import PiperVoice, PiperEngine, TextToAudioStream
import time

phonikud = Phonikud('TTS/phonikud-1.0.int8.onnx')
voice = PiperVoice(
    model_file="TTS/tts-model.onnx",
    config_file="TTS/tts-model.config.json"
)

engine = PiperEngine(
    piper_path= "/home/roee/docs/victor/venv/bin/piper",
    voice=voice
)

stream = TextToAudioStream(engine)

def generator(text):
    word_list = []
    answer = respond(text)
    for word in answer:
        word_list.append(word)
        if len(word_list) >= 5 or word.endswith((".", "?", "!")):
            yield " ".join(word_list) + " "
            print(word_list)
            word_list = []
    if word_list:
        yield " ".join(word_list) + " "

try:
    while True:
        text = input("input:")

        stream.feed(generator(text))
        stream.play(log_synthesized_text=True)
except KeyboardInterrupt:
    print("fine then...")

respond():

# yields LLMs response word by word (for the TTS)
def respond(question):
    buffer = ""
    print("\n[respond] starting...")
    print(f"[respond] speech: {question}")
    messeges = [
            SystemMessage(DEFINING_PROMPT),
            HumanMessage(question),
    ]
    print("\n[respond] generating response...")

    for chunk in MODEL.stream(messeges):
        token = str(chunk.content)

        if token is None:
            continue

        buffer += token

        matches = list(re.finditer(r"\S+[ \n.,!?]", buffer))

        last_index = 0
        for match in matches:
            word = match.group().strip()
            yield word
            print(word)
            last_index = match.end()

        buffer = buffer[last_index:]

    if buffer.strip():
        yield buffer.strip()
        print(buffer.strip())

    print("\n[respond] done!")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions