Naomi latency #443
Replies: 3 comments 3 replies
-
|
Oh, by the way, I am not currently using Discord because of the arbitration clause in their user agreement. I sent the email as required to opt out of that clause, but have not heard back and so am not going to accept the updated user agreement, so if anyone has any thoughts on a different venue for discussions I'm certainly interested. In the meantime, I'll be lurking on Github. |
Beta Was this translation helpful? Give feedback.
-
|
Also, for an example of a full voice assistant stack with the sort of low latency I am looking to achieve, check out the GLaDOS project at https://github.com/dnhkng/GLaDOS. This project only allows you to converse with an LLM and does not support the sort of robust plugin system that Naomi does, but it does indicate to me that my latency goals are achievable. |
Beta Was this translation helpful? Give feedback.
-
|
Since November, I have gone down the whole AI architecture rabbit hole for a project I wanted to create back in ~2013, but had no idea how or what I was doing. That same project is what led me to find Jasper in 2018, which then led to us making Naomi. I have finally circled back to the original project and have made leaps and bounds with it, and it's nice as it is the first project in years I have actually enjoyed programming and spending every day working on. That being said, as I have gone through the whole process, I keep finding things that would do wonders for Naomi, but even as you concluded it would require a complete rebuild from scratch. I actually know Python now, which is a plus. So I have been stuck in this position where I want Naomi to continue and grow, but with what is available out there today, especially with the rise and advancements of LLMs just in the past year, it will be very hard to get any sort of growth without a major overhaul and competition. I was leaning towards a fast api async approach and stream everything through the pipeline so TTFA (time to first audio) is as little as possible. We are limited by the compute of a Pi, but it will be a major improvement to what we have currently. This will also allow for better integration of plugins as well as various GUI options. There is discussion board functionality here on GH that we can turn on, and I can create a bot that will allow cross talk between the two, so regardless of platform, communication can still flow. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
One thing I have always appreciated about Naomi is its plugin system. Unfortunately, the implementation of the plugin system has caused the biggest issue for Naomi which is the latency. The audio recording has to conclude before the keyword spotter can run. The keyword spotter has to finish running before the STT engine can run. STT has to finish before the intent can be parsed. The TTS must complete before the audio can be sent to the speakers.
I'm planning to start a new build with new architecture that allows streaming between these levels - first, I will still be doing a basic volume based VAD, followed by some sort of voice filter before recording chunks. Once a chunk is identified as containing a voice, I will send it to the speech to text engine immediately, using streaming to start getting both speaker recognition and speech to text results immediately. Initially, I will be using the Sherpa-K2 project to create a proof of concept. The Text to Speech end will also be streamed as much as possible.
For the Text to Intent part, I plan to still use FAISS TTI (https://github.com/aaronchantrill/FAISS_TTI.git). For the SpeechHandler, I'll try to stick as close as possible to the current SpeechHandler plugins. I will be attempting to figure out how to integrate the concepts of "expect" and "confirm" with an LLM-first/optional approach. Having experimented a great deal with local llms, I do think they can be used effectively as a voice assistant front-end.
I'll start with experimenting with these concepts within a Jupyter Notebook. Once I am satisfied with the new API, I will convert the project into a proper Python application. At the same time, I will add the new html front end.
Since this is a major change, I'll be working in a new branch.
Any thoughts or suggestions welcome.
Beta Was this translation helpful? Give feedback.
All reactions