Skip to content

feat(web): real-time voice chat with Live Mode #1345

@crrow

Description

@crrow

Description

Add Gemini Live-style voice conversation to the desktop app:

  • Web Speech API for real-time STT (browser-side, zero latency)
  • LiveKit Agents UI components for voice UI (waveform, controls)
  • Server-side TTS (OpenAI) with streaming audio_delta events
  • Barge-in support (interrupt Rara mid-speech)

Frontend-first: Steps 1-3 work without backend changes (voice input → text → chat).
Steps 4-7 add TTS backend for voice responses.

Design doc: docs/plans/2026-04-13-realtime-voice-chat.md

Component

web (frontend, UI)

Alternatives considered

  • LiveKit full stack (Server + Agent): too heavy, adds unnecessary infrastructure
  • Browser speechSynthesis for TTS: robotic voice, not production quality
  • Server-side Whisper STT: 1-2s latency vs real-time Web Speech API

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent:claudeOperations performed by ClaudeenhancementNew feature or requestuiUser interface changes

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions