From 0eab04ac8bfad8be7e61cd850c6a2b7466eff4bf Mon Sep 17 00:00:00 2001 From: crrow Date: Thu, 9 Apr 2026 16:28:22 +0800 Subject: [PATCH] feat(serve): mlx-live-style voice pipeline with server-side VAD/ASR/LLM/TTS (#134) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace browser-side orchestration with a server-side full pipeline, matching mlx-live's architecture. Single WebSocket, raw PCM streaming. Server pipeline (/ws/voice): - VAD: RMS energy threshold, speech start/end detection - ASR: proxy to configurable Whisper-compatible endpoint - LLM: streaming proxy to OpenAI-compatible endpoint with SSE parsing - TTS: Kokoro via existing TtsBackend, output as float32 PCM frames - Interruption: VAD detects speech during TTS, cancels remaining output - Chat history: 10-turn rolling window per connection Frontend (demo.html rewrite): - Siri-like orb animation (idle/recording/generating states) - AudioWorklet for 16kHz PCM recording (pcm_worklet.js) - Float32 PCM playback via AudioContext scheduling - Settings modal with localStorage persistence - No Web Speech API, no browser-side LLM fetch, no CORS issues New endpoints: - WS /ws/voice — full voice conversation pipeline - GET /static/pcm-recorder-worklet.js — AudioWorklet module Closes #134 Co-Authored-By: Claude Opus 4.6 (1M context) --- src/serve/demo.html | 1390 ++++++++++++-------------------------- src/serve/handlers.rs | 28 +- src/serve/mod.rs | 14 +- src/serve/pcm_worklet.js | 49 ++ src/serve/tests.rs | 45 +- src/serve/voice.rs | 779 +++++++++++++++++++++ 6 files changed, 1329 insertions(+), 976 deletions(-) create mode 100644 src/serve/pcm_worklet.js create mode 100644 src/serve/voice.rs diff --git a/src/serve/demo.html b/src/serve/demo.html index d896601..4a41f52 100644 --- a/src/serve/demo.html +++ b/src/serve/demo.html @@ -3,1087 +3,561 @@ -kotoba TTS — voice chat demo +kotoba — voice -
-

kotoba — voice chat

-
Real-time multilingual voice conversation · STT → LLM → streaming TTS
- - -
-
- Settings -
- - - - - - - - - - - - - - - - - -
-
+
+
+

kotoba voice

+
- - +
+
+
+
tap to start
+
+
- -
-
+
+ +
+
+
- -
-
-
- idle + +