Skip to content

Merge and test weights models#7

Merged
Rebreda merged 8 commits intomainfrom
merge-and-test-weights-models
Mar 7, 2026
Merged

Merge and test weights models#7
Rebreda merged 8 commits intomainfrom
merge-and-test-weights-models

Conversation

@Rebreda
Copy link
Owner

@Rebreda Rebreda commented Mar 7, 2026

Take the weights from fine-tuning step and combine back into ASR model of choice.

Create a quick check to actually test if the new model handles keywords/improvements for sanities sake

add more documentation, fix previous utilities, clean up generally and ensure docker setup actually uses .env and update project readme now that we are end to end 🥳

Rebreda added 8 commits March 7, 2026 10:13
- merge.py: reads adapter_config.json for base model ID, loads with
  device_map='cpu' (avoids ROCm segfault during merge_and_unload),
  saves standalone WhisperForConditionalGeneration in safetensors format
  alongside processor artifacts so output dir is fully self-contained
- test_finetune_merge.py: 15 tests covering read_base_model_id, dry-run,
  missing dir, full mock pipeline, and _print_summary
- pyproject.toml: register listenr-merge and listenr-asr entry points
- Add 'merge' service: CPU-only (no GPU passthrough), HIP_VISIBLE_DEVICES=-1
  to prevent ROCm HSA from initialising and causing segfault
- Fix all volume entries: remove nested ${VAR:-${HOME}/path} defaults that
  podman-compose cannot parse; use plain ${VAR} with required .env file
- Add :z SELinux label to all bind mounts (required on Fedora/RHEL hosts)
- .env.example: add LISTENR_MERGED; uncomment all vars with /home/you/ placeholders
Loads a merged WhisperForConditionalGeneration and runs it against audio
clips from manifest.jsonl, printing side-by-side comparisons of the
original Whisper transcription and the fine-tuned model output.

- --keyword WORD (repeatable): filter to clips where WORD appears in
  corrected_transcription; marks each clip HIT/MISS and prints a recall
  summary table at the end
- --n N: number of clips to test (default 20)
- --min-duration S: skip clips shorter than S seconds
- --audio PATH: transcribe a single file instead of manifest clips
- JSONL parser uses raw_decode to handle objects concatenated on one line
  (a known manifest write bug that silently drops entries with json.loads)
Documents the RuntimeError that occurs when podman-compose encounters
nested ${VAR:-${HOME}/path} syntax, explains why plain ${VAR} is used
instead, and shows the sed one-liner to bootstrap .env from .env.example.
@Rebreda Rebreda merged commit 50680f9 into main Mar 7, 2026
2 checks passed
@Rebreda Rebreda deleted the merge-and-test-weights-models branch March 7, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant