Naomi latency #443

aaronchantrill · 2026-01-01T22:55:09Z

aaronchantrill
Jan 1, 2026
Maintainer

One thing I have always appreciated about Naomi is its plugin system. Unfortunately, the implementation of the plugin system has caused the biggest issue for Naomi which is the latency. The audio recording has to conclude before the keyword spotter can run. The keyword spotter has to finish running before the STT engine can run. STT has to finish before the intent can be parsed. The TTS must complete before the audio can be sent to the speakers.

I'm planning to start a new build with new architecture that allows streaming between these levels - first, I will still be doing a basic volume based VAD, followed by some sort of voice filter before recording chunks. Once a chunk is identified as containing a voice, I will send it to the speech to text engine immediately, using streaming to start getting both speaker recognition and speech to text results immediately. Initially, I will be using the Sherpa-K2 project to create a proof of concept. The Text to Speech end will also be streamed as much as possible.

For the Text to Intent part, I plan to still use FAISS TTI (https://github.com/aaronchantrill/FAISS_TTI.git). For the SpeechHandler, I'll try to stick as close as possible to the current SpeechHandler plugins. I will be attempting to figure out how to integrate the concepts of "expect" and "confirm" with an LLM-first/optional approach. Having experimented a great deal with local llms, I do think they can be used effectively as a voice assistant front-end.

I'll start with experimenting with these concepts within a Jupyter Notebook. Once I am satisfied with the new API, I will convert the project into a proper Python application. At the same time, I will add the new html front end.

Since this is a major change, I'll be working in a new branch.

Any thoughts or suggestions welcome.

aaronchantrill · 2026-01-01T22:58:36Z

aaronchantrill
Jan 1, 2026
Maintainer Author

Oh, by the way, I am not currently using Discord because of the arbitration clause in their user agreement. I sent the email as required to opt out of that clause, but have not heard back and so am not going to accept the updated user agreement, so if anyone has any thoughts on a different venue for discussions I'm certainly interested. In the meantime, I'll be lurking on Github.

0 replies

aaronchantrill · 2026-01-01T23:03:11Z

aaronchantrill
Jan 1, 2026
Maintainer Author

Also, for an example of a full voice assistant stack with the sort of low latency I am looking to achieve, check out the GLaDOS project at https://github.com/dnhkng/GLaDOS. This project only allows you to converse with an LLM and does not support the sort of robust plugin system that Naomi does, but it does indicate to me that my latency goals are achievable.

0 replies

AustinCasteel · 2026-01-02T14:49:13Z

AustinCasteel
Jan 2, 2026
Maintainer

Since November, I have gone down the whole AI architecture rabbit hole for a project I wanted to create back in ~2013, but had no idea how or what I was doing. That same project is what led me to find Jasper in 2018, which then led to us making Naomi. I have finally circled back to the original project and have made leaps and bounds with it, and it's nice as it is the first project in years I have actually enjoyed programming and spending every day working on.

That being said, as I have gone through the whole process, I keep finding things that would do wonders for Naomi, but even as you concluded it would require a complete rebuild from scratch. I actually know Python now, which is a plus. So I have been stuck in this position where I want Naomi to continue and grow, but with what is available out there today, especially with the rise and advancements of LLMs just in the past year, it will be very hard to get any sort of growth without a major overhaul and competition.

I was leaning towards a fast api async approach and stream everything through the pipeline so TTFA (time to first audio) is as little as possible. We are limited by the compute of a Pi, but it will be a major improvement to what we have currently. This will also allow for better integration of plugins as well as various GUI options.

There is discussion board functionality here on GH that we can turn on, and I can create a bot that will allow cross talk between the two, so regardless of platform, communication can still flow.

3 replies

AustinCasteel Jan 2, 2026
Maintainer

The funny part is that I did not even realize that we were in a discussion, I thought this was an issue... I need more sleep lol

aaronchantrill Jan 6, 2026
Maintainer Author

Austin, I'm so glad to hear from you! Any suggestions you have on architecture would be great. That's part of the reason I want to start in Jupyter Notebook, so it's easy to test things and change stuff and I won't feel super connected to it. I agree, the whole thing needs to be able to talk to itself through micro-services. FastAPI sounds like a good solution.This also makes it easier to split services between multiple servers, so several low-power raspberry pi's could be used to replace a single server, or all the processing could be offloaded to a home server with a decent GPU.

One thing I wanted to accomplish in this re-write was to switch from the old "threading" package to "asyncio" because it's much more common these days and I think it will be easier for incoming developers to work with.

AustinCasteel Jan 6, 2026
Maintainer

In my project I am using websockets because on a millisecond scale, ws is better than tcp for numerous reasons. I am down for asyncio because having things run in parallel is better as well as will be required in order to stream one step to the next without having the subsequent step wait to start processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Naomi latency #443

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Naomi latency #443

Uh oh!

aaronchantrill Jan 1, 2026 Maintainer

Replies: 3 comments · 3 replies

Uh oh!

aaronchantrill Jan 1, 2026 Maintainer Author

Uh oh!

aaronchantrill Jan 1, 2026 Maintainer Author

Uh oh!

AustinCasteel Jan 2, 2026 Maintainer

Uh oh!

AustinCasteel Jan 2, 2026 Maintainer

Uh oh!

aaronchantrill Jan 6, 2026 Maintainer Author

Uh oh!

AustinCasteel Jan 6, 2026 Maintainer

aaronchantrill
Jan 1, 2026
Maintainer

Replies: 3 comments 3 replies

aaronchantrill
Jan 1, 2026
Maintainer Author

aaronchantrill
Jan 1, 2026
Maintainer Author

AustinCasteel
Jan 2, 2026
Maintainer

AustinCasteel Jan 2, 2026
Maintainer

aaronchantrill Jan 6, 2026
Maintainer Author

AustinCasteel Jan 6, 2026
Maintainer