Jarvis - Local Voice Assistant

A privacy-focused, local-first voice assistant for Linux that runs entirely on your machine. Jarvis can understand speech, process commands using AI, perform system operations, and respond with natural speech.

Features

🧠 Local AI: Powered by Ollama with support for various LLM models (Llama 3.1, Mistral, etc.)
🎤 Speech-to-Text: Uses OpenAI Whisper (via faster-whisper) for accurate voice recognition
🔊 Text-to-Speech: Uses Piper TTS for natural voice synthesis
🎯 Voice Activity Detection: Intelligent listening with automatic silence detection
🛠️ System Operations:
- File management (create, read, delete files and directories)
- Web page fetching and information retrieval
- Application launching
🔒 Security: Sandboxed execution with whitelisted commands and directory restrictions
💬 Conversation Memory: Maintains context across multiple interactions

Architecture

Audio Input → STT (Whisper) → LLM (Ollama) → Action Executor → TTS (Piper) → Audio Output
                                     ↓
                            File Ops | Web Fetch | App Launch

Requirements

OS: Linux (tested on x86_64 and arm64)
Python: 3.10 or higher
RAM: 8GB minimum, 16GB recommended
Storage: ~10GB for models
Optional: NVIDIA GPU with CUDA for faster inference

Installation

1. Clone the Repository

cd /home/jules/Dev/other/Jarvis

2. Install System Dependencies

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install portaudio19-dev python3-pyaudio ffmpeg

# Fedora
sudo dnf install portaudio-devel python3-pyaudio ffmpeg

# Arch
sudo pacman -S portaudio python-pyaudio ffmpeg

3. Set Up Python Environment

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

4. Install Piper TTS

# Automated installation
python setup_piper.py

# Or manual installation:
# Download from: https://github.com/rhasspy/piper/releases
# Extract and place binary in ./piper/piper

5. Install and Configure Ollama

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model (choose one)
ollama pull llama3.1:8b      # Recommended for 16GB RAM
ollama pull mistral:7b       # Alternative
ollama pull llama3.1:8b-q4_0 # Quantized version for 8GB RAM

6. Configure Environment

# Copy example config
cp .env.example .env

# Edit configuration
nano .env

Key settings to configure:

WHISPER_MODEL: tiny, base, small, medium, or large (base recommended)
OLLAMA_MODEL: Model name you pulled (e.g., llama3.1:8b)
ALLOWED_DIRECTORIES: Directories where file operations are permitted
COMMAND_WHITELIST: Whitelisted commands for system operations

Usage

Voice Mode (Default)

python main.py

Speak naturally after the "Listening..." prompt. The assistant will:

Record your voice until silence is detected
Transcribe using Whisper
Process with the LLM
Execute any requested actions
Respond with synthesized speech

Voice Mode with Terminal UI

For a rich terminal interface showing chat history and actions:

python main.py --tui

The TUI displays:

Header: Current status and language
Conversation Panel: Complete chat history with timestamps
Actions Panel: Real-time log of system operations and tools
Help Panel: Quick reference for commands

Text Mode

For testing without audio I/O:

python main.py --text

Example Commands

English:

"Create a file called notes.txt in my home directory"
"What's on the website example.com?"
"List the files in my Documents folder"
"Open Firefox"
"Delete the file test.txt from tmp"
"What's the weather?" (with web search)

French:

"Crée un fichier appelé notes.txt dans mon répertoire personnel"
"Qu'est-ce qu'il y a sur le site exemple.com?"
"Liste les fichiers dans mon dossier Documents"
"Ouvre Firefox"
"Supprime le fichier test.txt de tmp"

Language Switching

Switch to French:

"Switch to French"
"Parle français"
"En français"

Switch to English:

"Switch to English"
"Parle anglais"
"In English"

You can also set the default language in .env with WHISPER_LANGUAGE=fr or WHISPER_LANGUAGE=en.

Exit Commands

Say or type: "exit", "quit", "goodbye", "stop" (English) or "au revoir", "arrête" (French)

Configuration

Edit .env to customize:

Audio Settings

SAMPLE_RATE: Audio sample rate (default: 16000 Hz)
CHANNELS: Audio channels (default: 1 for mono)

STT Settings

WHISPER_MODEL: Model size (tiny, base, small, medium, large)
- tiny: Fastest, least accurate (~75 MB)
- base: Good balance (~142 MB) - Recommended
- small: Better accuracy (~466 MB)
- medium: High accuracy (~1.5 GB)
- large: Best accuracy (~2.9 GB)
WHISPER_DEVICE: cpu or cuda (for NVIDIA GPUs)
WHISPER_COMPUTE_TYPE: int8 (CPU) or float16 (GPU)
WHISPER_LANGUAGE: Language code (en for English, fr for French, or auto for auto-detection)

LLM Settings

OLLAMA_HOST: Ollama server URL (default: http://localhost:11434)
OLLAMA_MODEL: Model name (e.g., llama3.1:8b)
LLM_TEMPERATURE: Response creativity (0.0-1.0, default: 0.7)
LLM_MAX_TOKENS: Maximum response length (default: 1000)

TTS Settings

PIPER_MODEL: Voice model (default: en_US-lessac-medium)
PIPER_SPEAKER_ID: Voice variant (0-based index)

Security Settings

ALLOWED_DIRECTORIES: Comma-separated paths where file operations are allowed
COMMAND_WHITELIST: Comma-separated list of allowed commands

Project Structure

Jarvis/
├── main.py                 # Main orchestration loop
├── config.py              # Configuration management
├── audio_handler.py       # Audio I/O and VAD
├── stt_module.py          # Speech-to-text (Whisper)
├── llm_module.py          # LLM integration (Ollama)
├── action_executor.py     # System operations executor
├── tts_module.py          # Text-to-speech (Piper)
├── setup_piper.py         # Piper installation script
├── requirements.txt       # Python dependencies
├── .env.example          # Example configuration
├── .env                  # Your configuration (create this)
└── piper/                # Piper binary and models (created by setup)

Troubleshooting

Audio Issues

No microphone input:

# List audio devices
python -c "import sounddevice as sd; print(sd.query_devices())"

# Test recording
python -c "import sounddevice as sd; import numpy as np; print('Recording...'); audio = sd.rec(int(3 * 16000), samplerate=16000, channels=1); sd.wait(); print('Done')"

Permission denied:

# Add user to audio group
sudo usermod -a -G audio $USER
# Log out and back in

Whisper Issues

Model download fails:

# Manually download models
python -c "from faster_whisper import WhisperModel; model = WhisperModel('base')"

Out of memory:

Use a smaller model (tiny or base)
Set WHISPER_COMPUTE_TYPE=int8

Ollama Issues

Connection refused:

# Start Ollama service
ollama serve

# Or check if running
ps aux | grep ollama

Model not found:

# List installed models
ollama list

# Pull required model
ollama pull llama3.1:8b

Piper Issues

Binary not found:

# Re-run setup
python setup_piper.py

# Or set explicit path in tts_module.py

Voice sounds robotic:

Try a different model (e.g., en_US-amy-medium)
Download from: https://huggingface.co/rhasspy/piper-voices

Performance Optimization

For Limited Hardware (8GB RAM)

WHISPER_MODEL=tiny
OLLAMA_MODEL=llama3.1:8b-q4_0
WHISPER_COMPUTE_TYPE=int8

For Better Quality (16GB+ RAM)

WHISPER_MODEL=small
OLLAMA_MODEL=llama3.1:8b
WHISPER_COMPUTE_TYPE=int8

With NVIDIA GPU

WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16

Pull GPU-optimized Ollama models and ensure CUDA is installed.

Security Considerations

Jarvis includes several security measures:

Directory Whitelisting: File operations only in ALLOWED_DIRECTORIES
Command Whitelisting: Only whitelisted commands can be executed
Action Confirmation: All file operations, web requests, and app launches require user approval
Smart Whitelist: Approve once, auto-approve future identical actions
- File operations: Whitelisted by directory
- Web requests: Whitelisted by domain
- Applications: Whitelisted by exact command
No Shell Injection: Uses subprocess with explicit arguments (no shell=True)
Path Validation: Resolves and validates all paths before operations
Timeout Protection: All operations have timeouts

Whitelist Storage: Approved actions are stored in command_whitelist.json for persistence.

Confirmation Options:

y - Execute this action once
a - Execute and add to whitelist for future auto-approval
n - Cancel the action

Important: Review and customize security settings in .env before use.

Extending Jarvis

Adding New Tools

Edit llm_module.py to add tool definitions:

TOOLS = [
    # ... existing tools ...
    {
        "type": "function",
        "function": {
            "name": "your_tool_name",
            "description": "What your tool does",
            "parameters": {
                "type": "object",
                "properties": {
                    "param1": {
                        "type": "string",
                        "description": "Parameter description"
                    }
                },
                "required": ["param1"]
            }
        }
    }
]

Then implement in action_executor.py:

def your_tool_name(self, param1: str) -> str:
    """Your tool implementation."""
    # ... your code ...
    return "Result"

Contributing

Contributions are welcome! Areas for improvement:

Wake word detection (e.g., "Hey Jarvis")
Multi-language support
Plugin architecture
Web UI
Home automation integration
Voice cloning for personalized TTS

License

This project is open source and available under the MIT License.

Acknowledgments

OpenAI Whisper - Speech recognition
faster-whisper - Optimized Whisper
Ollama - Local LLM inference
Piper - Fast neural TTS
webrtcvad - Voice activity detection

Support

For issues, questions, or suggestions, please open an issue on the repository.

Note: This is a local-first assistant. All processing happens on your machine - no data is sent to external servers.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
FEATURES_IDEA.md		FEATURES_IDEA.md
README.md		README.md
action_executor.py		action_executor.py
audio_handler.py		audio_handler.py
config.py		config.py
llm_module.py		llm_module.py
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh
setup_piper.py		setup_piper.py
stt_module.py		stt_module.py
tts_module.py		tts_module.py
tui.py		tui.py
whitelist_manager.py		whitelist_manager.py

Folders and files

Latest commit

History

Repository files navigation

Jarvis - Local Voice Assistant

Features

Architecture

Requirements

Installation

1. Clone the Repository

2. Install System Dependencies

3. Set Up Python Environment

4. Install Piper TTS

5. Install and Configure Ollama

6. Configure Environment

Usage

Voice Mode (Default)

Voice Mode with Terminal UI

Text Mode

Example Commands

Language Switching

Exit Commands

Configuration

Audio Settings

STT Settings

LLM Settings

TTS Settings

Security Settings

Project Structure

Troubleshooting

Audio Issues

Whisper Issues

Ollama Issues

Piper Issues

Performance Optimization

For Limited Hardware (8GB RAM)

For Better Quality (16GB+ RAM)

With NVIDIA GPU

Security Considerations

Extending Jarvis

Adding New Tools

Contributing

License

Acknowledgments

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages