Skip to content
Discussion options

You must be logged in to vote

Fixed it :
Issue was a function to store memory in vector db that was blocking the generator that streams the response from the LLM to the TTS.

Solution : used threading to run the save function in background.

Result :
Measure from vad stop to first audio output is now 2 sec.
This include transcribe, building context by requesting vector db, LLM call, start synthesis.
Not sure if on current HW I can squeeze more perf.
Now time to talk to the bot to teach it things.

# Run storing in background so it doesn't block yield
def store_async(user_text: str, full_text: str):
    logging.info("Storing conversation...")
    _store_conversation(user_text, full_text)
    logging.info("Conversation sto…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by franklin050187
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant