Listenr is a privacy-first, end-to-end pipeline for building a personalised Whisper model from your own voice. Record audio, have a local LLM clean up the transcriptions, fine-tune any openai/whisper-* model on that data, and deploy a standalone model — all running locally on your hardware via Lemonade Server. No audio, text, or model weights ever leave your machine.
Scope: The recording and fine-tuning pipeline is built around Whisper (whisper.cpp for capture,
WhisperForConditionalGenerationfor training). The dataset format —manifest.jsonl→ HuggingFace dataset — is model-agnostic and can feed any ASR trainer.
- Local-only, private by design. No cloud APIs. All inference runs on your CPU, GPU, or NPU via Lemonade Server.
- Open models. Uses Whisper.cpp for transcription and any GGUF-compatible LLM for post-processing correction.
- Automatic correction pipeline. A local LLM cleans up punctuation, grammar, and homophones — producing a higher-quality training corpus than raw Whisper output alone.
- Real-world data. Collects natural, conversational speech in realistic environments, including domain-specific vocabulary that generic models get wrong.
- Dataset-ready output. Every utterance is saved with its audio clip and appended to a single
manifest.jsonl. One command builds train/dev/test splits in HuggingFace dataset format. - Full fine-tuning pipeline. LoRA fine-tuning of any
openai/whisper-*model on AMD or NVIDIA GPU via a pre-built Podman container. No environment setup — justpodman compose run. - Deploy anywhere.
listenr-mergefolds the LoRA adapter into a self-containedWhisperForConditionalGenerationthat loads with plaintransformers, no PEFT required.
- Capture —
listenrstreams your microphone to Lemonade's/realtimeWebSocket in ~85 ms chunks, resampled to 16 kHz. - VAD — Lemonade's built-in voice activity detection segments speech boundaries automatically.
- Transcribe — Lemonade runs Whisper.cpp on each segment and streams back transcripts.
- Correct (optional) — a local LLM cleans the transcript and tags content categories.
- Save — each utterance is saved as a
.wavclip and a line inmanifest.jsonl. - Build dataset —
listenr-build-datasetwrites train/dev/test splits from the manifest. - Fine-tune —
listenr-finetunetrains a LoRA adapter on top of a Whisper base model using your collected data. - Merge —
listenr-mergefolds the adapter into the base model, producing a self-contained model that needs onlytransformers. - Test —
scripts/test_merged.pyruns the merged model against your clips and compares output to the original Whisper transcriptions.
git clone https://github.com/Rebreda/listenr
cd listenr
uv pip install -e .
lemonade-server serve # in another terminal
uv run listenr # start recordingOnce you have recordings, the full pipeline runs as:
# Build train/dev/test splits from your manifest
uv run listenr-build-dataset --format hf
# Fine-tune Whisper (see docs/finetune-amd.md for AMD GPUs)
podman compose run --rm finetune
# Merge the LoRA adapter into a standalone model
podman compose run --rm merge
# Test it against your clips
python scripts/test_merged.py --keyword YourDomainWordSee docs/setup.md for full installation instructions.
| Guide | Description |
|---|---|
| docs/setup.md | Installation, Lemonade Server, microphone setup |
| docs/configuration.md | Full config.ini reference, VAD tuning, available models |
| docs/recording.md | CLI usage, how recording works, batch transcription |
| docs/dataset.md | Building train/dev/test splits, CSV and HF formats |
| docs/finetune-amd.md | Fine-tuning Whisper on AMD GPU via ROCm + Podman, merging, and inference testing |
| docs/troubleshooting.md | Common errors and fixes |
Mozilla Public License Version 2.0 — see LICENSE.
- Lemonade Server — unified local inference API
- whisper.cpp — fast local ASR
- llama.cpp — fast local LLMs
