Skip to content

Multi-room voice interaction with your custom AI agent, with hotword and speaker detection.

Notifications You must be signed in to change notification settings

spencershepard/Voice-Link-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ VoiceLink AI

Local voice assistant service using low-resource satellite devices (Raspberry Pi or similar) for multi-room voice control. Audio processing runs on a local server with GPU acceleration while lightweight clients handle audio capture/playback. 🚀

Transcribed speech is sent to a webhook endpoint (n8n, Home Assistant, etc.) for command processing and optional TTS responses (audio playback). All audio data is encrypted using AES-256 for secure transmission over local networks. 🔐

Why VoiceLink AI?

  • Local Processing - All audio and transcription happen on your local network, fast with no cloud dependencies.
  • Custom Wake Words - Use Porcupine to create personalized wake words for your assistant
  • Easy Custom Integration - Works seamlessly with n8n, Home Assistant, and other webhook-compatible platforms

✨ Features

  • 🎯 Wake Word Detection - Porcupine custom wake word
  • 🗣️ Speech-to-Text - Faster Whisper with GPU support
  • 🏠 Multi-Room Support - Multiple simultaneous client devices
  • 📍 Location-Aware - Device ID/name included in webhook payload
  • 🔒 AES-256 Encryption - Secure audio transmission
  • 🎵 Voice Activity Detection - Silero VAD for smart silence detection
  • 🙍 Speaker Identification - Recognize different users (coming soon)
  • 🔊 Audio Feedback - Success/error sounds and TTS responses routed to originating device
  • 🔄 Auto-Reconnection - Clients reconnect automatically on network issues

🏗️ Architecture

graph LR
    A[🎙️ Client Devices<br/>Raspberry Pi / PC] -->|🔒 Audio Stream| B[🖥️ Server<br/>STT Processing]
    B -->|Transcribed Text| C[🔗 Webhook<br/>n8n / HA]
    C -->|TTS Response| B
    B -->|🔒 Audio Response| A
    
    style A fill:#667eea,stroke:#333,stroke-width:2px,color:#fff
    style B fill:#f093fb,stroke:#333,stroke-width:3px,color:#fff
    style C fill:#4facfe,stroke:#333,stroke-width:2px,color:#fff
Loading

🏠 Home Assistant and n8n Integration

My personal setup uses Home Assistant and n8n for processing voice commands and controlling smart home devices. Here's a brief overview of my setup:

  • n8n webhook receives transcribed text from VoiceLink AI server (this project) running on a Windows PC. My personalized AI agent response is sent to Eleven Labs TTS API for TTS audio generation. The webhook responds with the audio (b64 in data field). ElevenLabs is not free, but the quality is excellent. You can substitute any TTS service that returns audio data.
  • n8n agent uses Home Assistant MCP tool (Home Assistant's built-in MCP server enabled) to access HA services and devices.
  • My 'satellite' client devices are Raspberry Pi 3 running the client service.
  • Satillite Pi's also run MPD (music player daemon) as an easy way for my AI agent to make TTS announcements (via the MCP server). I haven't been able to get Music Assistant to work with MPD yet...although they are visible as players in Music Assistant.

🚀 Quick Start

💻 Server (Windows)

cd server
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your configuration
python server.py

🥧 Client (Raspberry Pi)

cd client
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your configuration
python client.py

🎯 Porcupine Wake Word

This project uses Porcupine by Picovoice for wake word detection. Porcupine provides accurate, low-latency wake word recognition that runs efficiently on both server and edge devices. 🎪

🔑 Getting a Porcupine Access Key

Picovoice Porcupine is an excellent choice for wake word detection due to its accuracy, low resource usage, and ease of customization. Up to 3 devices and custom wake words are FREE (this project only needs 1 device - the server).

  1. Create Account - Sign up at Picovoice Console 👤
  2. Get Access Key - Copy your free access key from the dashboard 🎫
  3. Add to .env - Set PORCUPINE_ACCESS_KEY in your server's .env file ⚙️

🎨 Custom Wake Words

You can create custom wake words for your assistant:

Option 1: Pre-Built Wake Words (Easiest)

  • Use built-in wake words like "Hey Siri", "Alexa", "Computer", etc.
  • Available in the Porcupine library without customization

Option 2: Custom Wake Words (Recommended)

  1. Go to Picovoice Console 🌐
  2. Navigate to PorcupineWake Words 📋
  3. Click Train New Wake Word 🎓
  4. Enter your custom phrase (e.g., "Hey Dude", "Jarvis", "Computer") 💬
  5. Download the .ppn model file 📥
  6. Place it in server/resources/ 📁
  7. Update your server code to reference the new model file 🔧

Tips for Custom Wake Words: 💡

  • Choose 2-3 syllable phrases for best accuracy ✅
  • Avoid common words said in regular conversation 🚫
  • Test with different accents and speaking styles 🌍
  • The free tier includes custom wake word training 🆓

The included Hey-Dude_en_windows_v3_0_0.ppn model is a custom wake word example.

⚙️ Configuration

See .env.example files in server/ and client/ directories. Copy to .env and configure:

  • 🖥️ Server: Set WEBHOOK_URL, PORCUPINE_ACCESS_KEY, and ENCRYPTION_KEY
  • 📱 Client: Set server connection details, audio devices, and matching ENCRYPTION_KEY

🔐 Security & Encryption

This encryption implementation uses AES-256-CBC with a random IV for each message, ensuring strong confidentiality for audio data. 🛡️ The shared secret key must be securely managed by the user. This implementation does not cover key exchange or advanced security features like authentication or certificates. This is a basic encryption layer to protect audio data in transit on local networks.

⚠️ It is recommended to run this system on trusted networks only -- not over the public internet. Encryption helps protect against casual eavesdropping but does not provide full end-to-end security necessary for untrusted environments. This is a design decision to balance security with ease of use for local home deployments.

⚡ Quick Setup (1 Minute)

1. Generate Key 🔑

python support\encryption.py

Copy the output key (looks like: Zy8xK3R5bGVzaGVldCBhbmQgdGhlIGZpbGU=)

2. Add to Both .env Files 📝

Server .env:

ENCRYPTION_KEY=paste_your_key_here

Client .env:

ENCRYPTION_KEY=paste_the_same_key_here

⚠️ Important: Both server and client MUST use the exact same key!

3. Restart Services 🔄

Verify it's working - Look for on startup:

Encryption enabled - audio data will be encrypted

📦 What's Encrypted

  • ✅ Audio stream data (client ↔ server)
  • ✅ TTS responses and feedback sounds
  • ❌ JSON control messages (unencrypted for debugging)

🔧 Troubleshooting

"Invalid padding" error → Keys don't match. Ensure both server and client have the exact same ENCRYPTION_KEY.

"No ENCRYPTION_KEY" warning → Add key to .env file and restart.

No encryption key? System will run with unencrypted audio and log a warning.

🔗 Webhook Integration

The server sends transcribed speech to a webhook endpoint (n8n, Home Assistant, etc.) and expects optional audio responses for TTS playback. 📡

Request (Server → Webhook)

{
  "speaker": "John",
  "text": "turn on the kitchen lights",
  "timestamp": "2025-11-15T10:30:45.123456",
  "device_id": "kitchen_pi",
  "device_name": "Kitchen Pi"
}

Response (Webhook → Server)

{
  "data": "base64_encoded_wav_audio_for_tts"
}

The data field is optional. If provided, the audio will be played back on the originating client device.

🌊 n8n Webhook Example

  1. 📥 Webhook Node - Receive POST requests
  2. ⚙️ Process Text - Parse command, trigger actions (Home Assistant nodes, HTTP requests, etc.)
  3. 🗣️ Text-to-Speech - Optional: generate response audio (e.g., ElevenLabs, Google TTS)
  4. 📤 Respond to Webhook - Return JSON with base64 audio in data field

📋 Requirements

  • 💻 Server: Windows 10/11, Python 3.8+, CUDA-compatible GPU (optional)
  • 🥧 Client: Raspberry Pi 3+ or any Linux/Windows device with microphone
  • 🌐 Network: WebSocket connectivity between clients and server

🪟 Windows Service (NSSM)

The server can be run as a Windows service using NSSM (Non-Sucking Service Manager). 🛠️ This allows the server to start automatically on boot and run in the background (no login required). 🔄

nssm install VoiceAssistant "C:\path\to\server\.venv\Scripts\python.exe" "C:\path\to\server\server.py"

Credits

About

Multi-room voice interaction with your custom AI agent, with hotword and speaker detection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published