feat(serve): mlx-live-style voice pipeline (#134)#135
Merged
Conversation
…LM/TTS (#134) Replace browser-side orchestration with a server-side full pipeline, matching mlx-live's architecture. Single WebSocket, raw PCM streaming. Server pipeline (/ws/voice): - VAD: RMS energy threshold, speech start/end detection - ASR: proxy to configurable Whisper-compatible endpoint - LLM: streaming proxy to OpenAI-compatible endpoint with SSE parsing - TTS: Kokoro via existing TtsBackend, output as float32 PCM frames - Interruption: VAD detects speech during TTS, cancels remaining output - Chat history: 10-turn rolling window per connection Frontend (demo.html rewrite): - Siri-like orb animation (idle/recording/generating states) - AudioWorklet for 16kHz PCM recording (pcm_worklet.js) - Float32 PCM playback via AudioContext scheduling - Settings modal with localStorage persistence - No Web Speech API, no browser-side LLM fetch, no CORS issues New endpoints: - WS /ws/voice — full voice conversation pipeline - GET /static/pcm-recorder-worklet.js — AudioWorklet module Closes #134 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete rewrite of the voice chat demo to match mlx-live's architecture. Everything runs server-side through a single WebSocket.
Architecture
```
Browser: AudioWorklet → 16kHz PCM int16 → WS binary
Server (/ws/voice):
VAD (RMS energy) → speech end → ASR (Whisper API) → LLM (OpenAI stream)
→ sentence buffer → TTS (Kokoro) → float32 PCM → WS binary
Browser: float32 PCM → AudioContext scheduled playback
```
What changed
Why
The previous demo (browser Web Speech API + fetch LLM + /ws/tts) was:
Server-side pipeline eliminates all three problems.
Test plan
Closes #134