-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
questionFurther information is requestedFurther information is requested
Description
This project aims to enable interactive voice conversations with Pete using integrated Automatic Speech Recognition (ASR), Large Language Model (LLM) for dialogue, and Text-to-Speech (TTS). The goal is to orchestrate a seamless conversation loop where users can speak to Pete, have their speech transcribed, receive intelligent responses, and hear Pete reply in natural-sounding speech.
Component Status Updates:
-
Text-to-Speech (TTS):
- The TTS module is working for the most part; currently falling back to espeak because either the forebrain is down or it's not running the correct version of our custom coqui streamer.
- It's good enough for now, but getting the coqui extension running is a priority.
- Goal: Achieve voice synthesis with just the motherbrain, with graceful degradation if extensions are unavailable.
-
Chat Module:
- Stubbed but untested; based on a previous working version.
- Handles conversation tracking and currently uses ollama.
- Future direction: Integrate our own LLM model runner, with fallback to ollama for graceful degradation.
-
Automatic Speech Recognition (ASR):
- Mostly implemented with a graceful degradation path to local faster-whisper.
- Relies on our own ASR over websocket service; based on an older prototype and currently untested/integrated.
Key Objectives (unchanged):
- Integrate ASR to capture and transcribe user speech
- Connect transcribed text to LLM for generating responses
- Integrate TTS to convert LLM responses to speech
- Orchestrate the full conversation workflow
- Test and iterate on voice interaction quality
- Document the design and technology stack
Sub-issues will track each technology component and implementation step.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested