A sophisticated voice-controlled interface for Claude with natural text-to-speech, advanced voice activity detection, and profile management capabilities.
- Whisper Speech Recognition: Accurate transcription using OpenAI Whisper (tiny to large models)
- Dual TTS Engines:
- Coqui TTS: Natural British voices with sentence streaming
- Piper TTS: Fast, lightweight speech synthesis
- Voice Activity Detection: Silero VAD for accurate speech detection
- Wake Word Detection: Configurable wake word with fuzzy matching
- TTS Interruption: Press ESC to immediately stop Claude's speech output
- Wake Mode: Activated by wake word ("Hey Claude" by default)
- Chat Mode: Continuous conversation with 2-minute inactivity timeout
- Ask Mode: Single question/answer interaction
- Create and manage multiple conversation contexts
- Each profile maintains its own CLAUDE.md instructions and session state
- Voice commands for profile management
- Persistent sessions with automatic UUID-based resumption
- Python 3.10+
- PipeWire or PulseAudio
- Claude CLI installed and configured
# Clone repository
git clone <repository-url>
cd claude-voice
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Optional: Install VAD for better speech detection
pip install silero-vad torch torchaudio
# Optional: Install development dependencies
make install-dev# Use the launcher script (recommended)
./claude-voice wake # Wake word mode
./claude-voice chat # Conversation mode
./claude-voice ask # Single question mode
# Or run directly
python claude_voice.py wake# Mode selection (positional argument)
{chat,ask,wake} # Interaction mode (default: wake)
# Audio configuration
--model {tiny,base,small,medium,large} # Whisper model size (default: base)
--sample-rate {16000,48000} # Audio sample rate in Hz (default: 16000)
--silence-threshold INT # Amplitude threshold for silence detection (default: 1000)
--calibrate # Calibrate noise floor before starting
# Voice configuration
--wake-word "your phrase" # Custom wake word (default: "hey claude")
--tts-engine {auto,coqui,piper} # TTS engine selection (default: auto)
--voice NAME # TTS voice selection:
# british_male, british_female (Coqui)
# alan, cori (Piper)
# p258, p287 (Coqui raw models)
--speech-rate FLOAT # Speech rate: 0.5=fast, 1.5=slow (default: 1.1)
# Debugging
--verbose # Enable verbose logging for debuggingAvailable in all modes:
- "Create profile": Interactive profile creation
- "Load profile [name]": Switch to a specific profile
- "List profiles": Show available profiles
- "Reset context": Clear current profile
- "Cancel": Stop current operation
- "Goodbye" or "Exit": End session
Profiles allow you to maintain separate conversation contexts for different use cases.
.context/
βββ profile_name/
β βββ CLAUDE.md # Profile-specific instructions
βββ another_profile/
β βββ CLAUDE.md
βββ .claude/
βββ settings.json # Security settings
- Say "create profile" when prompted
- Provide a name for the profile
- Describe what the profile should help with
- The assistant will create a tailored CLAUDE.md
- Profiles maintain session continuity using Claude's
--resumeflag - Each profile gets its own UUID for session management
- Context automatically switches when loading profiles
- Sample Rate: Configurable (16kHz default, 48kHz supported)
- Format: 16-bit PCM, mono
- Chunk Size: 512 samples for VAD, 1024 for amplitude detection
- Start Threshold: 0.85 (high confidence to start)
- Continue Threshold: 0.5 (lower to maintain)
- Silence Duration: 2 seconds to end recording
- Pre-buffer: 10 chunks (~320ms) to capture speech onset
- Sentence Streaming: Parallel processing for faster response
- Smart Splitting: Breaks on punctuation, avoids fragments
- Voice Options:
- Coqui: p258 (male), p287 (female) British voices
- Piper: alan (male), cori (female) voices
make test # Run all tests
make test-verbose # Run with verbose outputmake lint # Run ruff linting with fixes
make format # Format with black
make check # Check without modifying
make clean # Clean cache filesclaude-voice/
βββ claude_voice.py # Main entry point
βββ voice_assistant/ # Package modules
β βββ audio/ # Audio recording/playback
β βββ config/ # Configuration management
β βββ core/ # Core interfaces
β βββ profiles/ # Profile management
β βββ transcription/ # Whisper integration
β βββ tts/ # TTS engines
βββ tests/ # Test suite
βββ .context/ # User profiles (gitignored)
βββ Makefile # Development commands
- Check default audio device:
pactl info | grep "Default Source" - Ensure microphone permissions are granted
- Try calibration mode:
--calibrate
- Use larger Whisper model:
--model medium - Ensure clear audio input without background noise
- Check microphone positioning
- Coqui TTS requires more resources but sounds natural
- Piper TTS is faster but more robotic
- Try switching engines:
--tts-engine piper
- Profiles are case-insensitive and punctuation is removed
- Check
.context/directory for profile folders - Ensure Claude CLI is properly configured
The assistant runs in a sandboxed environment:
- Claude operations restricted to
.context/directory - No system command execution
- No network access from sandbox
- Settings enforced via
.context/.claude/settings.json
- Follow existing code patterns
- Run tests before submitting:
make test - Use linting tools:
make lint - Update documentation when adding features
[Your License Here]
- OpenAI Whisper for speech recognition
- Silero team for VAD model
- Coqui TTS for natural voices
- Piper TTS for fast synthesis
- Claude by Anthropic