A privacy-focused, local-first voice assistant for Linux that runs entirely on your machine. Jarvis can understand speech, process commands using AI, perform system operations, and respond with natural speech.
- 🧠 Local AI: Powered by Ollama with support for various LLM models (Llama 3.1, Mistral, etc.)
- 🎤 Speech-to-Text: Uses OpenAI Whisper (via faster-whisper) for accurate voice recognition
- 🔊 Text-to-Speech: Uses Piper TTS for natural voice synthesis
- 🎯 Voice Activity Detection: Intelligent listening with automatic silence detection
- 🛠️ System Operations:
- File management (create, read, delete files and directories)
- Web page fetching and information retrieval
- Application launching
- 🔒 Security: Sandboxed execution with whitelisted commands and directory restrictions
- 💬 Conversation Memory: Maintains context across multiple interactions
Audio Input → STT (Whisper) → LLM (Ollama) → Action Executor → TTS (Piper) → Audio Output
↓
File Ops | Web Fetch | App Launch
- OS: Linux (tested on x86_64 and arm64)
- Python: 3.10 or higher
- RAM: 8GB minimum, 16GB recommended
- Storage: ~10GB for models
- Optional: NVIDIA GPU with CUDA for faster inference
cd /home/jules/Dev/other/Jarvis# Ubuntu/Debian
sudo apt-get update
sudo apt-get install portaudio19-dev python3-pyaudio ffmpeg
# Fedora
sudo dnf install portaudio-devel python3-pyaudio ffmpeg
# Arch
sudo pacman -S portaudio python-pyaudio ffmpeg# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt# Automated installation
python setup_piper.py
# Or manual installation:
# Download from: https://github.com/rhasspy/piper/releases
# Extract and place binary in ./piper/piper# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model (choose one)
ollama pull llama3.1:8b # Recommended for 16GB RAM
ollama pull mistral:7b # Alternative
ollama pull llama3.1:8b-q4_0 # Quantized version for 8GB RAM# Copy example config
cp .env.example .env
# Edit configuration
nano .envKey settings to configure:
WHISPER_MODEL:tiny,base,small,medium, orlarge(base recommended)OLLAMA_MODEL: Model name you pulled (e.g.,llama3.1:8b)ALLOWED_DIRECTORIES: Directories where file operations are permittedCOMMAND_WHITELIST: Whitelisted commands for system operations
python main.pySpeak naturally after the "Listening..." prompt. The assistant will:
- Record your voice until silence is detected
- Transcribe using Whisper
- Process with the LLM
- Execute any requested actions
- Respond with synthesized speech
For a rich terminal interface showing chat history and actions:
python main.py --tuiThe TUI displays:
- Header: Current status and language
- Conversation Panel: Complete chat history with timestamps
- Actions Panel: Real-time log of system operations and tools
- Help Panel: Quick reference for commands
For testing without audio I/O:
python main.py --textEnglish:
- "Create a file called notes.txt in my home directory"
- "What's on the website example.com?"
- "List the files in my Documents folder"
- "Open Firefox"
- "Delete the file test.txt from tmp"
- "What's the weather?" (with web search)
French:
- "Crée un fichier appelé notes.txt dans mon répertoire personnel"
- "Qu'est-ce qu'il y a sur le site exemple.com?"
- "Liste les fichiers dans mon dossier Documents"
- "Ouvre Firefox"
- "Supprime le fichier test.txt de tmp"
Switch to French:
- "Switch to French"
- "Parle français"
- "En français"
Switch to English:
- "Switch to English"
- "Parle anglais"
- "In English"
You can also set the default language in .env with WHISPER_LANGUAGE=fr or WHISPER_LANGUAGE=en.
Say or type: "exit", "quit", "goodbye", "stop" (English) or "au revoir", "arrête" (French)
Edit .env to customize:
SAMPLE_RATE: Audio sample rate (default: 16000 Hz)CHANNELS: Audio channels (default: 1 for mono)
WHISPER_MODEL: Model size (tiny,base,small,medium,large)tiny: Fastest, least accurate (~75 MB)base: Good balance (~142 MB) - Recommendedsmall: Better accuracy (~466 MB)medium: High accuracy (~1.5 GB)large: Best accuracy (~2.9 GB)
WHISPER_DEVICE:cpuorcuda(for NVIDIA GPUs)WHISPER_COMPUTE_TYPE:int8(CPU) orfloat16(GPU)WHISPER_LANGUAGE: Language code (enfor English,frfor French, orautofor auto-detection)
OLLAMA_HOST: Ollama server URL (default: http://localhost:11434)OLLAMA_MODEL: Model name (e.g.,llama3.1:8b)LLM_TEMPERATURE: Response creativity (0.0-1.0, default: 0.7)LLM_MAX_TOKENS: Maximum response length (default: 1000)
PIPER_MODEL: Voice model (default:en_US-lessac-medium)PIPER_SPEAKER_ID: Voice variant (0-based index)
ALLOWED_DIRECTORIES: Comma-separated paths where file operations are allowedCOMMAND_WHITELIST: Comma-separated list of allowed commands
Jarvis/
├── main.py # Main orchestration loop
├── config.py # Configuration management
├── audio_handler.py # Audio I/O and VAD
├── stt_module.py # Speech-to-text (Whisper)
├── llm_module.py # LLM integration (Ollama)
├── action_executor.py # System operations executor
├── tts_module.py # Text-to-speech (Piper)
├── setup_piper.py # Piper installation script
├── requirements.txt # Python dependencies
├── .env.example # Example configuration
├── .env # Your configuration (create this)
└── piper/ # Piper binary and models (created by setup)
No microphone input:
# List audio devices
python -c "import sounddevice as sd; print(sd.query_devices())"
# Test recording
python -c "import sounddevice as sd; import numpy as np; print('Recording...'); audio = sd.rec(int(3 * 16000), samplerate=16000, channels=1); sd.wait(); print('Done')"Permission denied:
# Add user to audio group
sudo usermod -a -G audio $USER
# Log out and back inModel download fails:
# Manually download models
python -c "from faster_whisper import WhisperModel; model = WhisperModel('base')"Out of memory:
- Use a smaller model (
tinyorbase) - Set
WHISPER_COMPUTE_TYPE=int8
Connection refused:
# Start Ollama service
ollama serve
# Or check if running
ps aux | grep ollamaModel not found:
# List installed models
ollama list
# Pull required model
ollama pull llama3.1:8bBinary not found:
# Re-run setup
python setup_piper.py
# Or set explicit path in tts_module.pyVoice sounds robotic:
- Try a different model (e.g.,
en_US-amy-medium) - Download from: https://huggingface.co/rhasspy/piper-voices
WHISPER_MODEL=tiny
OLLAMA_MODEL=llama3.1:8b-q4_0
WHISPER_COMPUTE_TYPE=int8WHISPER_MODEL=small
OLLAMA_MODEL=llama3.1:8b
WHISPER_COMPUTE_TYPE=int8WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16Pull GPU-optimized Ollama models and ensure CUDA is installed.
Jarvis includes several security measures:
- Directory Whitelisting: File operations only in
ALLOWED_DIRECTORIES - Command Whitelisting: Only whitelisted commands can be executed
- Action Confirmation: All file operations, web requests, and app launches require user approval
- Smart Whitelist: Approve once, auto-approve future identical actions
- File operations: Whitelisted by directory
- Web requests: Whitelisted by domain
- Applications: Whitelisted by exact command
- No Shell Injection: Uses subprocess with explicit arguments (no
shell=True) - Path Validation: Resolves and validates all paths before operations
- Timeout Protection: All operations have timeouts
Whitelist Storage: Approved actions are stored in command_whitelist.json for persistence.
Confirmation Options:
y- Execute this action oncea- Execute and add to whitelist for future auto-approvaln- Cancel the action
Important: Review and customize security settings in .env before use.
Edit llm_module.py to add tool definitions:
TOOLS = [
# ... existing tools ...
{
"type": "function",
"function": {
"name": "your_tool_name",
"description": "What your tool does",
"parameters": {
"type": "object",
"properties": {
"param1": {
"type": "string",
"description": "Parameter description"
}
},
"required": ["param1"]
}
}
}
]Then implement in action_executor.py:
def your_tool_name(self, param1: str) -> str:
"""Your tool implementation."""
# ... your code ...
return "Result"Contributions are welcome! Areas for improvement:
- Wake word detection (e.g., "Hey Jarvis")
- Multi-language support
- Plugin architecture
- Web UI
- Home automation integration
- Voice cloning for personalized TTS
This project is open source and available under the MIT License.
- OpenAI Whisper - Speech recognition
- faster-whisper - Optimized Whisper
- Ollama - Local LLM inference
- Piper - Fast neural TTS
- webrtcvad - Voice activity detection
For issues, questions, or suggestions, please open an issue on the repository.
Note: This is a local-first assistant. All processing happens on your machine - no data is sent to external servers.