Listenr: Record, Correct, and Fine-tune Your Own Whisper Model

Listenr is a privacy-first, end-to-end pipeline for building a personalised Whisper model from your own voice. Record audio, have a local LLM clean up the transcriptions, fine-tune any openai/whisper-* model on that data, and deploy a standalone model — all running locally on your hardware via Lemonade Server. No audio, text, or model weights ever leave your machine.

Scope: The recording and fine-tuning pipeline is built around Whisper (whisper.cpp for capture, WhisperForConditionalGeneration for training). The dataset format — manifest.jsonl → HuggingFace dataset — is model-agnostic and can feed any ASR trainer.

Why Listenr?

Local-only, private by design. No cloud APIs. All inference runs on your CPU, GPU, or NPU via Lemonade Server.
Open models. Uses Whisper.cpp for transcription and any GGUF-compatible LLM for post-processing correction.
Automatic correction pipeline. A local LLM cleans up punctuation, grammar, and homophones — producing a higher-quality training corpus than raw Whisper output alone.
Real-world data. Collects natural, conversational speech in realistic environments, including domain-specific vocabulary that generic models get wrong.
Dataset-ready output. Every utterance is saved with its audio clip and appended to a single manifest.jsonl. One command builds train/dev/test splits in HuggingFace dataset format.
Full fine-tuning pipeline. LoRA fine-tuning of any openai/whisper-* model on AMD or NVIDIA GPU via a pre-built Podman container. No environment setup — just podman compose run.
Deploy anywhere. listenr-merge folds the LoRA adapter into a self-contained WhisperForConditionalGeneration that loads with plain transformers, no PEFT required.

How It Works

Capture — listenr streams your microphone to Lemonade's /realtime WebSocket in ~85 ms chunks, resampled to 16 kHz.
VAD — Lemonade's built-in voice activity detection segments speech boundaries automatically.
Transcribe — Lemonade runs Whisper.cpp on each segment and streams back transcripts.
Correct (optional) — a local LLM cleans the transcript and tags content categories.
Save — each utterance is saved as a .wav clip and a line in manifest.jsonl.
Build dataset — listenr-build-dataset writes train/dev/test splits from the manifest.
Fine-tune — listenr-finetune trains a LoRA adapter on top of a Whisper base model using your collected data.
Merge — listenr-merge folds the adapter into the base model, producing a self-contained model that needs only transformers.
Test — scripts/test_merged.py runs the merged model against your clips and compares output to the original Whisper transcriptions.

Quick Start

git clone https://github.com/Rebreda/listenr
cd listenr
uv pip install -e .
lemonade-server serve   # in another terminal
uv run listenr          # start recording

Once you have recordings, the full pipeline runs as:

# Build train/dev/test splits from your manifest
uv run listenr-build-dataset --format hf

# Fine-tune Whisper (see docs/finetune-amd.md for AMD GPUs)
podman compose run --rm finetune

# Merge the LoRA adapter into a standalone model
podman compose run --rm merge

# Test it against your clips
python scripts/test_merged.py --keyword YourDomainWord

See docs/setup.md for full installation instructions.

Documentation

Guide	Description
docs/setup.md	Installation, Lemonade Server, microphone setup
docs/configuration.md	Full `config.ini` reference, VAD tuning, available models
docs/recording.md	CLI usage, how recording works, batch transcription
docs/dataset.md	Building train/dev/test splits, CSV and HF formats
docs/finetune-amd.md	Fine-tuning Whisper on AMD GPU via ROCm + Podman, merging, and inference testing
docs/troubleshooting.md	Common errors and fixes

License

Mozilla Public License Version 2.0 — see LICENSE.

Acknowledgments

Lemonade Server — unified local inference API
whisper.cpp — fast local ASR
llama.cpp — fast local LLMs

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src/listenr		src/listenr
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
screenshot.png		screenshot.png
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Listenr: Record, Correct, and Fine-tune Your Own Whisper Model

Why Listenr?

How It Works

Quick Start

Documentation

License

Acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Listenr: Record, Correct, and Fine-tune Your Own Whisper Model

Why Listenr?

How It Works

Quick Start

Documentation

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages