Skip to content

Add local-first TTS/STT providers, event-driven architecture, and uv/justfile tooling#1

Open
lancekrogers wants to merge 13 commits intoethanplusai:mainfrom
Obedience-Corp:main
Open

Add local-first TTS/STT providers, event-driven architecture, and uv/justfile tooling#1
lancekrogers wants to merge 13 commits intoethanplusai:mainfrom
Obedience-Corp:main

Conversation

@lancekrogers
Copy link

@lancekrogers lancekrogers commented Mar 16, 2026

Summary

Major overhaul that makes Samantha fully usable without any API keys or cloud services. Adds a pluggable provider system for both TTS and STT, replaces the tightly-coupled conversation loop with an event-driven architecture, and modernizes the build/dev tooling.

Local-First Audio Providers

  • Kokoro TTS (default) — high-quality local ONNX text-to-speech, no API key needed
  • Whisper STT (default) — local speech-to-text via faster-whisper, no internet required
  • Edge TTS — free cloud alternative (Microsoft), no API key
  • Existing Fish Audio TTS preserved as an optional paid provider for custom voice clones
  • Google STT preserved as an optional cloud fallback
  • All providers follow a common base class (TTSProvider / STTProvider) for easy extensibility

Event-Driven Architecture

  • New EventBus (samantha/events.py) decouples the conversation engine from the UI
  • New ConversationEngine (samantha/engine.py) manages the listen → think → speak loop independently of display concerns
  • UI subscribes to engine events (listening, thinking, speaking, error, etc.) instead of being called directly
  • Adds real-time status callbacks for STT listening/transcribing phases
  • Live microphone level visualization during listening state

New CLI Commands

  • samantha test — end-to-end mic + speaker test with per-provider diagnostics
  • samantha voices — list available TTS voices with locale/gender filtering
  • samantha providers — show installed/active TTS and STT providers with install hints

Build & Tooling

  • Migrated from setuptools to hatchling build backend
  • Added uv as the package manager with optional dependency groups ([whisper], [fish], [edge], [local], [cloud], [all])
  • Added python-dotenv for .env file support
  • Full justfile system with modular recipes: voice.just, dev.just, install.just
  • Recipes for switching providers, going fully local, testing audio, and more

Other Improvements

  • Animated microphone waveform indicator for listening state
  • Conversation history and config management unchanged (backward compatible)
  • Updated CLAUDE.md to reflect new defaults and zero-API-key setup

Changed Files

  • samantha/tts/ — New TTS provider package (kokoro, edge, fish + base class)
  • samantha/stt/ — New STT provider package (whisper, google + base class)
  • samantha/engine.py — New conversation engine (decoupled from UI)
  • samantha/events.py — New event bus for engine↔UI communication
  • samantha/voice.py — Refactored to provider-agnostic orchestration
  • samantha/cli.py — New subcommands (test, voices, providers)
  • samantha/config.py — New provider settings and defaults
  • samantha/ui.py — Event-driven status display with waveform animation
  • pyproject.toml — hatchling backend, optional deps, dotenv
  • justfile + .justfiles/ — Full modular just recipe system
  • uv.lock — Lockfile for reproducible installs

Test Plan

  • samantha --text works with default Kokoro TTS (no API keys)
  • samantha works with default Whisper STT + Kokoro TTS (fully local)
  • samantha test correctly reports mic and speaker status
  • samantha providers shows installed vs missing providers
  • samantha voices lists available voices for the active TTS provider
  • Switching providers via samantha config tts_provider edge works
  • just recipes work (just talk, just text, just voice go-local)
  • Fish Audio TTS still works when API key is provided
  • Google STT still works as a fallback

Replace static status dot with pulsing waveform bars animation
that shows real-time visual feedback during voice capture phases.
Extract conversation logic into engine.py with event bus (events.py),
allowing UI hot-reload without losing conversation state.
…ation

Default STT provider is now Whisper instead of Google, completing
the move to fully local providers (Kokoro TTS was already default).
No API keys needed out of the box.

Also adds real-time mic level tracking with smooth waveform animation
that responds to actual voice input, replacing the static looping frames.
The mic animation now shows idle breathing dots when waiting and a
live level-driven waveform when speech is detected.
Add configurable TTS/STT provider system with Kokoro-82M local TTS
@lancekrogers lancekrogers changed the title Update with local model support, uv package manager and dotenv Add local-first TTS/STT providers, event-driven architecture, and uv/justfile tooling Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant