RichardAtCT · thereisnotime · Mar 20, 2026
diff --git a/.env.example b/.env.example
@@ -140,6 +140,34 @@ QUICK_ACTIONS_TIMEOUT=120
 # Git operations timeout in seconds
 GIT_OPERATIONS_TIMEOUT=30
 
+# === VOICE TRANSCRIPTION ===
+# Enable voice message transcription
+ENABLE_VOICE_MESSAGES=true
+
+# Voice transcription provider: mistral, openai, or local
+# - mistral: Uses Mistral Voxtral (requires MISTRAL_API_KEY)
+# - openai: Uses OpenAI Whisper API (requires OPENAI_API_KEY)
+# - local: Uses whisper.cpp binary (requires ffmpeg + whisper.cpp installed)
+VOICE_PROVIDER=mistral
+
+# API keys (only needed for cloud providers)
+MISTRAL_API_KEY=
+OPENAI_API_KEY=
+
+# Override transcription model (optional)
+# Defaults: voxtral-mini-latest (mistral), whisper-1 (openai), base (local)
+VOICE_TRANSCRIPTION_MODEL=
+
+# Maximum voice message size in MB
+VOICE_MAX_FILE_SIZE_MB=20
+
+# Local whisper.cpp settings (only used when VOICE_PROVIDER=local)
+# Path to whisper.cpp binary (auto-detected from PATH if unset)
+WHISPER_CPP_BINARY_PATH=
+# Path to GGML model file, or model name like "base", "small", "medium"
+# Named models look for ~/.cache/whisper-cpp/ggml-{name}.bin
+WHISPER_CPP_MODEL_PATH=base
+
 # === PROJECT THREAD MODE ===
 # Enable strict routing by Telegram project topics
 ENABLE_PROJECT_THREADS=false

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -102,7 +102,7 @@ Multi-project topics: `ENABLE_PROJECT_THREADS` (default false), `PROJECT_THREADS
 
 Output verbosity: `VERBOSE_LEVEL` (default 1, range 0-2). Controls how much of Claude's background activity is shown to the user in real-time. 0 = quiet (only final response, typing indicator still active), 1 = normal (tool names + reasoning snippets shown during execution), 2 = detailed (tool names with input summaries + longer reasoning text). Users can override per-session via `/verbose 0|1|2`. A persistent typing indicator is refreshed every ~2 seconds at all levels.
 
-Voice transcription: `ENABLE_VOICE_MESSAGES` (default true), `VOICE_PROVIDER` (`mistral`|`openai`, default `mistral`), `MISTRAL_API_KEY`, `OPENAI_API_KEY`, `VOICE_TRANSCRIPTION_MODEL`. Provider implementation is in `src/bot/features/voice_handler.py`.
+Voice transcription: `ENABLE_VOICE_MESSAGES` (default true), `VOICE_PROVIDER` (`mistral`|`openai`|`local`, default `mistral`), `MISTRAL_API_KEY`, `OPENAI_API_KEY`, `VOICE_TRANSCRIPTION_MODEL`. For local provider: `WHISPER_CPP_BINARY_PATH`, `WHISPER_CPP_MODEL_PATH` (requires ffmpeg + whisper.cpp installed). Provider implementation is in `src/bot/features/voice_handler.py`.
 
 Feature flags in `src/config/features.py` control: MCP, git integration, file uploads, quick actions, session export, image uploads, voice messages, conversation mode, agentic mode, API server, scheduler.
 

diff --git a/README.md b/README.md
@@ -194,7 +194,7 @@ Enable with `ENABLE_API_SERVER=true` and `ENABLE_SCHEDULER=true`. See [docs/setu
 - Directory sandboxing with path traversal prevention
 - File upload handling with archive extraction
 - Image/screenshot upload with analysis
-- Voice message transcription (Mistral Voxtral / OpenAI Whisper)
+- Voice message transcription (Mistral Voxtral / OpenAI Whisper / [local whisper.cpp](docs/local-whisper-cpp.md))
 - Git integration with safe repository operations
 - Quick actions system with context-aware buttons
 - Session export in Markdown, HTML, and JSON formats

diff --git a/docs/configuration.md b/docs/configuration.md
@@ -135,11 +135,15 @@ ENABLE_QUICK_ACTIONS=true
 
 # Enable voice message transcription
 ENABLE_VOICE_MESSAGES=true
-VOICE_PROVIDER=mistral              # 'mistral' (default) or 'openai'
+VOICE_PROVIDER=mistral              # 'mistral', 'openai', or 'local'
 MISTRAL_API_KEY=                     # Required when VOICE_PROVIDER=mistral
 OPENAI_API_KEY=                      # Required when VOICE_PROVIDER=openai
 VOICE_TRANSCRIPTION_MODEL=           # Default: voxtral-mini-latest (Mistral) or whisper-1 (OpenAI)
 VOICE_MAX_FILE_SIZE_MB=20            # Max Telegram voice file size to download (1-200MB)
+
+# Local whisper.cpp settings (only used when VOICE_PROVIDER=local)
+WHISPER_CPP_BINARY_PATH=             # Path to whisper.cpp binary (auto-detected from PATH if unset)
+WHISPER_CPP_MODEL_PATH=base          # Path to GGML model file or model name (base, small, medium, large)
 ```
 
 #### Agentic Platform

diff --git a/docs/local-whisper-cpp.md b/docs/local-whisper-cpp.md
@@ -0,0 +1,170 @@
+# Local Voice Transcription with whisper.cpp
+
+This guide explains how to build and configure [whisper.cpp](https://github.com/ggerganov/whisper.cpp) for **offline** voice message transcription — no API keys or cloud services required.
+
+## Overview
+
+When `VOICE_PROVIDER=local` the bot transcribes Telegram voice messages entirely on your machine using:
+
+| Component | Purpose |
+|---|---|
+| **ffmpeg** | Converts Telegram OGG/Opus audio to 16 kHz mono WAV |
+| **whisper.cpp** | Runs OpenAI's Whisper model locally via optimised C/C++ |
+| **GGML model** | Quantised model weights (downloaded once) |
+
+## Prerequisites
+
+- A C/C++ toolchain (`gcc`/`clang`, `cmake`, `make`)
+- `ffmpeg` installed and on PATH
+- ~400 MB disk space for the `base` model (~1.5 GB for `medium`)
+
+## 1. Install ffmpeg
+
+### Ubuntu / Debian
+
+```bash
+sudo apt update && sudo apt install -y ffmpeg
+```
+
+### macOS (Homebrew)
+
+```bash
+brew install ffmpeg
+```
+
+### Alpine
+
+```bash
+apk add ffmpeg
+```
+
+Verify:
+
+```bash
+ffmpeg -version
+```
+
+## 2. Build whisper.cpp from source
+
+```bash
+# Clone the repository
+git clone https://github.com/ggerganov/whisper.cpp.git
+cd whisper.cpp
+
+# Build with CMake (recommended)
+cmake -B build
+cmake --build build --config Release
+
+# The binary is at build/bin/whisper-cli (or build/bin/main on older versions)
+ls build/bin/whisper-cli
+```
+
+> **Tip:** For GPU acceleration add `-DWHISPER_CUBLAS=ON` (NVIDIA) or `-DWHISPER_METAL=ON` (Apple Silicon) to the cmake configure step.
+
+### Install system-wide (optional)
+
+```bash
+sudo cp build/bin/whisper-cli /usr/local/bin/whisper-cpp
+```
+
+Or add the build directory to your `PATH`:
+
+```bash
+export PATH="$PWD/build/bin:$PATH"
+```
+
+## 3. Download a GGML model
+
+Models are hosted on Hugging Face. Pick one based on your hardware:
+
+| Model | Size | RAM (approx.) | Quality |
+|---|---|---|---|
+| `tiny` | ~75 MB | ~400 MB | Fast but lower accuracy |
+| `base` | ~142 MB | ~500 MB | Good balance (default) |
+| `small` | ~466 MB | ~1 GB | Better accuracy |
+| `medium` | ~1.5 GB | ~2.5 GB | High accuracy |
+| `large-v3` | ~3 GB | ~5 GB | Best accuracy, slow on CPU |
+
+```bash
+# Create the model cache directory
+mkdir -p ~/.cache/whisper-cpp
+
+# Download the base model (recommended starting point)
+curl -L -o ~/.cache/whisper-cpp/ggml-base.bin \
+  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin
+
+# Or download small for better accuracy
+curl -L -o ~/.cache/whisper-cpp/ggml-small.bin \
+  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin
+```
+
+## 4. Configure the bot
+
+Add the following to your `.env`:
+
+```bash
+# Enable voice transcription with local provider
+ENABLE_VOICE_MESSAGES=true
+VOICE_PROVIDER=local
+
+# Path to the whisper.cpp binary (omit if already on PATH as "whisper-cpp")
+WHISPER_CPP_BINARY_PATH=/usr/local/bin/whisper-cpp
+
+# Model: a name like "base", "small", "medium" or a full file path
+# Named models resolve to ~/.cache/whisper-cpp/ggml-{name}.bin
+WHISPER_CPP_MODEL_PATH=base
+```
+
+### Minimal configuration
+
+If `whisper-cpp` is on your PATH and you downloaded the `base` model to the default location, you only need:
+
+```bash
+VOICE_PROVIDER=local
+```
+
+## 5. Verify the setup
+
+```bash
+# Test ffmpeg conversion
+ffmpeg -f lavfi -i "sine=frequency=440:duration=2" -ar 16000 -ac 1 /tmp/test.wav -y
+
+# Test whisper.cpp
+whisper-cpp -m ~/.cache/whisper-cpp/ggml-base.bin -f /tmp/test.wav --no-timestamps
+```
+
+You should see a transcription attempt (it will be empty or nonsensical for a sine wave, but the binary should run without errors).
+
+## Troubleshooting
+
+### `whisper.cpp binary not found on PATH`
+
+The bot could not locate the binary. Either:
+- Install it system-wide: `sudo cp build/bin/whisper-cli /usr/local/bin/whisper-cpp`
+- Or set the full path: `WHISPER_CPP_BINARY_PATH=/path/to/whisper-cli`
+
+### `whisper.cpp model not found`
+
+The model file does not exist at the expected path. Download it:
+
+```bash
+mkdir -p ~/.cache/whisper-cpp
+curl -L -o ~/.cache/whisper-cpp/ggml-base.bin \
+  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin
+```
+
+### `ffmpeg is required but was not found`
+
+Install ffmpeg for your platform (see step 1 above).
+
+### Poor transcription quality
+
+- Try a larger model (`small` or `medium` instead of `base`)
+- Ensure audio is not too short (< 1 second) or too noisy
+- whisper.cpp uses `--language auto` by default; this works well for most languages
+
+### High CPU usage / slow transcription
+
+- Use a smaller model (`tiny` or `base`)
+- Enable GPU acceleration when building whisper.cpp (CUDA / Metal)
+- Consider using the `mistral` or `openai` cloud providers for faster results on low-powered machines
diff --git a/docs/setup.md b/docs/setup.md
@@ -197,12 +197,23 @@ VOICE_PROVIDER=openai
 OPENAI_API_KEY=your-openai-api-key
 ```
 
-If you installed via pip/uv, make sure voice extras are installed:
+**Local whisper.cpp (offline, no API key needed):**
+```bash
+VOICE_PROVIDER=local
+# Optional — auto-detected from PATH if unset
+WHISPER_CPP_BINARY_PATH=/usr/local/bin/whisper-cpp
+# Model name ("base", "small", "medium") or full path to .bin file
+WHISPER_CPP_MODEL_PATH=base
+```
+
+Requires `ffmpeg` and a locally built `whisper.cpp` binary. See the full [local whisper.cpp setup guide](local-whisper-cpp.md) for build instructions and model downloads.
+
+If you installed via pip/uv, make sure voice extras are installed (cloud providers only):
 ```bash
 pip install "claude-code-telegram[voice]"
 ```
 
-Optionally override the transcription model with `VOICE_TRANSCRIPTION_MODEL` (defaults to `voxtral-mini-latest` for Mistral, `whisper-1` for OpenAI).
+Optionally override the transcription model with `VOICE_TRANSCRIPTION_MODEL` (defaults to `voxtral-mini-latest` for Mistral, `whisper-1` for OpenAI, `base` for local).
 
 ### Notification Recipients
 

diff --git a/src/bot/features/registry.py b/src/bot/features/registry.py
@@ -78,10 +78,14 @@ def _initialize_features(self):
         except Exception as e:
             logger.error("Failed to initialize image handler", error=str(e))
 
-        # Voice transcription - requires provider-specific API key
+        # Voice transcription - requires provider-specific API key (or local)
         voice_key_available = (
+            self.config.voice_provider == "local"
+        ) or (
             self.config.voice_provider == "openai" and self.config.openai_api_key
-        ) or (self.config.voice_provider == "mistral" and self.config.mistral_api_key)
+        ) or (
+            self.config.voice_provider == "mistral" and self.config.mistral_api_key
+        )
         if self.config.enable_voice_messages and voice_key_available:
             try:
                 self.features["voice_handler"] = VoiceHandler(config=self.config)