Skip to content

Couvbat/Jarvis

Repository files navigation

Jarvis - Local Voice Assistant

A privacy-focused, local-first voice assistant for Linux that runs entirely on your machine. Jarvis can understand speech, process commands using AI, perform system operations, and respond with natural speech.

Features

  • 🧠 Local AI: Powered by Ollama with support for various LLM models (Llama 3.1, Mistral, etc.)
  • 🎤 Speech-to-Text: Uses OpenAI Whisper (via faster-whisper) for accurate voice recognition
  • 🔊 Text-to-Speech: Uses Piper TTS for natural voice synthesis
  • 🎯 Voice Activity Detection: Intelligent listening with automatic silence detection
  • 🛠️ System Operations:
    • File management (create, read, delete files and directories)
    • Web page fetching and information retrieval
    • Application launching
  • 🔒 Security: Sandboxed execution with whitelisted commands and directory restrictions
  • 💬 Conversation Memory: Maintains context across multiple interactions

Architecture

Audio Input → STT (Whisper) → LLM (Ollama) → Action Executor → TTS (Piper) → Audio Output
                                     ↓
                            File Ops | Web Fetch | App Launch

Requirements

  • OS: Linux (tested on x86_64 and arm64)
  • Python: 3.10 or higher
  • RAM: 8GB minimum, 16GB recommended
  • Storage: ~10GB for models
  • Optional: NVIDIA GPU with CUDA for faster inference

Installation

1. Clone the Repository

cd /home/jules/Dev/other/Jarvis

2. Install System Dependencies

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install portaudio19-dev python3-pyaudio ffmpeg

# Fedora
sudo dnf install portaudio-devel python3-pyaudio ffmpeg

# Arch
sudo pacman -S portaudio python-pyaudio ffmpeg

3. Set Up Python Environment

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

4. Install Piper TTS

# Automated installation
python setup_piper.py

# Or manual installation:
# Download from: https://github.com/rhasspy/piper/releases
# Extract and place binary in ./piper/piper

5. Install and Configure Ollama

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model (choose one)
ollama pull llama3.1:8b      # Recommended for 16GB RAM
ollama pull mistral:7b       # Alternative
ollama pull llama3.1:8b-q4_0 # Quantized version for 8GB RAM

6. Configure Environment

# Copy example config
cp .env.example .env

# Edit configuration
nano .env

Key settings to configure:

  • WHISPER_MODEL: tiny, base, small, medium, or large (base recommended)
  • OLLAMA_MODEL: Model name you pulled (e.g., llama3.1:8b)
  • ALLOWED_DIRECTORIES: Directories where file operations are permitted
  • COMMAND_WHITELIST: Whitelisted commands for system operations

Usage

Voice Mode (Default)

python main.py

Speak naturally after the "Listening..." prompt. The assistant will:

  1. Record your voice until silence is detected
  2. Transcribe using Whisper
  3. Process with the LLM
  4. Execute any requested actions
  5. Respond with synthesized speech

Voice Mode with Terminal UI

For a rich terminal interface showing chat history and actions:

python main.py --tui

The TUI displays:

  • Header: Current status and language
  • Conversation Panel: Complete chat history with timestamps
  • Actions Panel: Real-time log of system operations and tools
  • Help Panel: Quick reference for commands

Text Mode

For testing without audio I/O:

python main.py --text

Example Commands

English:

  • "Create a file called notes.txt in my home directory"
  • "What's on the website example.com?"
  • "List the files in my Documents folder"
  • "Open Firefox"
  • "Delete the file test.txt from tmp"
  • "What's the weather?" (with web search)

French:

  • "Crée un fichier appelé notes.txt dans mon répertoire personnel"
  • "Qu'est-ce qu'il y a sur le site exemple.com?"
  • "Liste les fichiers dans mon dossier Documents"
  • "Ouvre Firefox"
  • "Supprime le fichier test.txt de tmp"

Language Switching

Switch to French:

  • "Switch to French"
  • "Parle français"
  • "En français"

Switch to English:

  • "Switch to English"
  • "Parle anglais"
  • "In English"

You can also set the default language in .env with WHISPER_LANGUAGE=fr or WHISPER_LANGUAGE=en.

Exit Commands

Say or type: "exit", "quit", "goodbye", "stop" (English) or "au revoir", "arrête" (French)

Configuration

Edit .env to customize:

Audio Settings

  • SAMPLE_RATE: Audio sample rate (default: 16000 Hz)
  • CHANNELS: Audio channels (default: 1 for mono)

STT Settings

  • WHISPER_MODEL: Model size (tiny, base, small, medium, large)
    • tiny: Fastest, least accurate (~75 MB)
    • base: Good balance (~142 MB) - Recommended
    • small: Better accuracy (~466 MB)
    • medium: High accuracy (~1.5 GB)
    • large: Best accuracy (~2.9 GB)
  • WHISPER_DEVICE: cpu or cuda (for NVIDIA GPUs)
  • WHISPER_COMPUTE_TYPE: int8 (CPU) or float16 (GPU)
  • WHISPER_LANGUAGE: Language code (en for English, fr for French, or auto for auto-detection)

LLM Settings

  • OLLAMA_HOST: Ollama server URL (default: http://localhost:11434)
  • OLLAMA_MODEL: Model name (e.g., llama3.1:8b)
  • LLM_TEMPERATURE: Response creativity (0.0-1.0, default: 0.7)
  • LLM_MAX_TOKENS: Maximum response length (default: 1000)

TTS Settings

  • PIPER_MODEL: Voice model (default: en_US-lessac-medium)
  • PIPER_SPEAKER_ID: Voice variant (0-based index)

Security Settings

  • ALLOWED_DIRECTORIES: Comma-separated paths where file operations are allowed
  • COMMAND_WHITELIST: Comma-separated list of allowed commands

Project Structure

Jarvis/
├── main.py                 # Main orchestration loop
├── config.py              # Configuration management
├── audio_handler.py       # Audio I/O and VAD
├── stt_module.py          # Speech-to-text (Whisper)
├── llm_module.py          # LLM integration (Ollama)
├── action_executor.py     # System operations executor
├── tts_module.py          # Text-to-speech (Piper)
├── setup_piper.py         # Piper installation script
├── requirements.txt       # Python dependencies
├── .env.example          # Example configuration
├── .env                  # Your configuration (create this)
└── piper/                # Piper binary and models (created by setup)

Troubleshooting

Audio Issues

No microphone input:

# List audio devices
python -c "import sounddevice as sd; print(sd.query_devices())"

# Test recording
python -c "import sounddevice as sd; import numpy as np; print('Recording...'); audio = sd.rec(int(3 * 16000), samplerate=16000, channels=1); sd.wait(); print('Done')"

Permission denied:

# Add user to audio group
sudo usermod -a -G audio $USER
# Log out and back in

Whisper Issues

Model download fails:

# Manually download models
python -c "from faster_whisper import WhisperModel; model = WhisperModel('base')"

Out of memory:

  • Use a smaller model (tiny or base)
  • Set WHISPER_COMPUTE_TYPE=int8

Ollama Issues

Connection refused:

# Start Ollama service
ollama serve

# Or check if running
ps aux | grep ollama

Model not found:

# List installed models
ollama list

# Pull required model
ollama pull llama3.1:8b

Piper Issues

Binary not found:

# Re-run setup
python setup_piper.py

# Or set explicit path in tts_module.py

Voice sounds robotic:

Performance Optimization

For Limited Hardware (8GB RAM)

WHISPER_MODEL=tiny
OLLAMA_MODEL=llama3.1:8b-q4_0
WHISPER_COMPUTE_TYPE=int8

For Better Quality (16GB+ RAM)

WHISPER_MODEL=small
OLLAMA_MODEL=llama3.1:8b
WHISPER_COMPUTE_TYPE=int8

With NVIDIA GPU

WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16

Pull GPU-optimized Ollama models and ensure CUDA is installed.

Security Considerations

Jarvis includes several security measures:

  1. Directory Whitelisting: File operations only in ALLOWED_DIRECTORIES
  2. Command Whitelisting: Only whitelisted commands can be executed
  3. Action Confirmation: All file operations, web requests, and app launches require user approval
  4. Smart Whitelist: Approve once, auto-approve future identical actions
    • File operations: Whitelisted by directory
    • Web requests: Whitelisted by domain
    • Applications: Whitelisted by exact command
  5. No Shell Injection: Uses subprocess with explicit arguments (no shell=True)
  6. Path Validation: Resolves and validates all paths before operations
  7. Timeout Protection: All operations have timeouts

Whitelist Storage: Approved actions are stored in command_whitelist.json for persistence.

Confirmation Options:

  • y - Execute this action once
  • a - Execute and add to whitelist for future auto-approval
  • n - Cancel the action

Important: Review and customize security settings in .env before use.

Extending Jarvis

Adding New Tools

Edit llm_module.py to add tool definitions:

TOOLS = [
    # ... existing tools ...
    {
        "type": "function",
        "function": {
            "name": "your_tool_name",
            "description": "What your tool does",
            "parameters": {
                "type": "object",
                "properties": {
                    "param1": {
                        "type": "string",
                        "description": "Parameter description"
                    }
                },
                "required": ["param1"]
            }
        }
    }
]

Then implement in action_executor.py:

def your_tool_name(self, param1: str) -> str:
    """Your tool implementation."""
    # ... your code ...
    return "Result"

Contributing

Contributions are welcome! Areas for improvement:

  • Wake word detection (e.g., "Hey Jarvis")
  • Multi-language support
  • Plugin architecture
  • Web UI
  • Home automation integration
  • Voice cloning for personalized TTS

License

This project is open source and available under the MIT License.

Acknowledgments

Support

For issues, questions, or suggestions, please open an issue on the repository.


Note: This is a local-first assistant. All processing happens on your machine - no data is sent to external servers.

About

A Local first, Privacy focused, Voice-to-Voice Assistant

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors