Skip to content

feat: add local whisper.cpp voice transcription provider#157

Open
thereisnotime wants to merge 1 commit intoRichardAtCT:mainfrom
thereisnotime:feat/local-whisper-cpp-provider
Open

feat: add local whisper.cpp voice transcription provider#157
thereisnotime wants to merge 1 commit intoRichardAtCT:mainfrom
thereisnotime:feat/local-whisper-cpp-provider

Conversation

@thereisnotime
Copy link

Summary

  • Adds a third voice transcription provider (VOICE_PROVIDER=local) that uses whisper.cpp and ffmpeg for fully offline, API-key-free voice message transcription
  • New settings: WHISPER_CPP_BINARY_PATH and WHISPER_CPP_MODEL_PATH for configuring the local binary and model
  • Dedicated setup guide at docs/local-whisper-cpp.md with build-from-source instructions, model download links, and troubleshooting tips

Changes

  • src/bot/features/voice_handler.py — new _transcribe_local() pipeline: OGG→WAV (ffmpeg) → whisper.cpp binary
  • src/config/settings.pywhisper_cpp_binary_path, whisper_cpp_model_path fields + resolver properties
  • src/config/features.py — local provider skips API key check
  • src/bot/features/registry.py — updated key-availability logic
  • src/bot/handlers/message.py / src/bot/orchestrator.py — provider-aware error messages
  • docs/local-whisper-cpp.md — full build & setup guide
  • .env.example, CLAUDE.md, README.md, docs/configuration.md — documentation updates
  • Tests — full coverage for local provider (ffmpeg, binary, model, empty output, non-zero exit)

Test plan

  • Run existing test suite (pytest) — all tests should pass
  • Verify VOICE_PROVIDER=local with whisper.cpp installed transcribes a real voice message
  • Verify clear error messages when ffmpeg / whisper.cpp binary / model file is missing
  • Verify VOICE_PROVIDER=mistral and VOICE_PROVIDER=openai still work unchanged

🤖 Generated with Claude Code

@thereisnotime thereisnotime force-pushed the feat/local-whisper-cpp-provider branch from 9524828 to affa44f Compare March 20, 2026 00:38
@thereisnotime
Copy link
Author

Hey @RichardAtCT 👋 — would appreciate a review when you get a chance! This adds a local whisper.cpp voice transcription provider (no API keys needed).

@FridayOpenClawBot
Copy link

PR Review
Reviewed head: affa44f2a351a86e7bb4e3834cc8b6504b6299e0

Summary

  • Adds a third voice transcription provider (VOICE_PROVIDER=local) backed by whisper.cpp + ffmpeg — fully offline, no API key required
  • New settings WHISPER_CPP_BINARY_PATH / WHISPER_CPP_MODEL_PATH with sensible defaults and named-model resolution to ~/.cache/whisper-cpp/ggml-{name}.bin
  • Full unit test coverage for all error paths (ffmpeg missing, binary missing, model missing, empty output, non-zero exit)

What looks good

  • Clean provider abstraction — _transcribe_local is well-isolated and the existing Mistral/OpenAI paths are untouched
  • Tempfile cleanup in a finally block is correct; no risk of leaking WAV files even on failure
  • Error messages are actionable (include install commands and env var names) — good UX for a self-hosted setup

Issues / questions

  1. [Important] src/bot/features/voice_handler.py — Neither _convert_ogg_to_wav nor _run_whisper_cpp has a timeout. process.communicate() will block indefinitely if ffmpeg or whisper.cpp stalls. A near-20 MB file on a slow machine (or a model file that takes a long time to load the first time) could tie up the bot until the process exits or is killed externally. Consider asyncio.wait_for(process.communicate(), timeout=120) (or whatever the existing GIT_OPERATIONS_TIMEOUT pattern uses), raising a RuntimeError("transcription timed out") on expiry so the user gets feedback.

  2. [Nit] src/bot/features/voice_handler.py_resolve_whisper_binary validates via shutil.which(binary) but returns the original unresolved string (binary), discarding the fully-qualified path (resolved). This is fine for subprocess dispatch since PATH lookup happens again at exec time, but it means the validated path isn't reused — if PATH somehow changes between validation and execution, the nice error message is bypassed and you'd get a raw FileNotFoundError. Returning resolved from the method would make validation and execution consistent.

Verdict
⚠️ Merge after fixes — timeout on subprocess calls is the main gap; everything else is solid.

Friday, AI assistant to @RichardAtCT

@RichardAtCT
Copy link
Owner

Hey @RichardAtCT 👋 — would appreciate a review when you get a chance! This adds a local whisper.cpp voice transcription provider (no API keys needed).

Thanks - great idea. I actually use local whisper everywhere else so this makes sense!

Can you please fix the timeout flagged by @FridayOpenClawBot and the failing lint and then it is good to merge

Add a third voice provider option (VOICE_PROVIDER=local) that transcribes
Telegram voice messages entirely offline using whisper.cpp and ffmpeg.
No API keys or cloud services required.

- New local provider in voice_handler.py (OGG->WAV via ffmpeg, then whisper.cpp)
- Settings: WHISPER_CPP_BINARY_PATH, WHISPER_CPP_MODEL_PATH
- Feature flag, registry, and error messages updated for local provider
- Dedicated build/setup guide at docs/local-whisper-cpp.md
- Full test coverage for the local provider path
- Updated .env.example, CLAUDE.md, README.md, docs/configuration.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@thereisnotime thereisnotime force-pushed the feat/local-whisper-cpp-provider branch from affa44f to 5501304 Compare March 20, 2026 08:58
@thereisnotime
Copy link
Author

Thanks for the review @RichardAtCT and @FridayOpenClawBot! Both issues have been addressed:

  1. Timeouts — Added asyncio.wait_for(..., timeout=120) to both _convert_ogg_to_wav and _run_whisper_cpp. On timeout the subprocess is killed and a clear RuntimeError is raised.
  2. Resolved binary path_resolve_whisper_binary now caches and returns the fully-qualified path from shutil.which() so validation and execution are consistent.
  3. Lint — Ran black + isort on all affected files.

Also added docs/setup.md updates with the local provider configuration example and a link to the full build guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants