Conversation
Introduces a speaker signature storage system so that once a person is
identified in one recording their voice can be automatically recognised
in future sessions.
Changes
-------
speaker_db.py (new)
- JSON-based database stored in the user config directory
(~/.config/noScribe/speaker_signatures.json)
- find_match(embedding) – cosine-similarity lookup with configurable
threshold (default 0.75)
- save_speaker(name, embedding) – add/update entry, blending the new
embedding with any existing one for the same name so the model
gradually adapts to voice variation
- list_speakers() / delete_speaker() helpers for future management UI
pyannote_mp_worker.py
- After diarization, extracts per-speaker L2-normalised embeddings by
feeding the longest audio segments (≥ 1.5 s, up to 5 per speaker)
through the pipeline's already-loaded embedding model
- Returns embeddings alongside the segment list in the result message;
the entire extraction is wrapped in try/except so any failure is
silently logged and never blocks the transcription
noScribe.py
- Imports speaker_db
- SpeakerNamingDialog (CTkToplevel): modal dialog shown between
diarization and transcription; lists each detected speaker with
their matched name and confidence badge (green ≥ 75 %, orange
> 55 %, grey = new speaker), an editable name field and a Save
checkbox; OK applies names + saves checked signatures, Skip falls
back to S01/S02 labels
- _run_diarize_subprocess now returns (segments, embeddings)
- _run_speaker_naming_dialog helper runs the dialog in the GUI thread
- speaker_name_map (closure variable) carries label→name mappings
into find_speaker so confirmed names appear directly in the
transcript instead of S01/S02
- Threading: dialog is scheduled via self.after(0, …) and the worker
thread waits on a threading.Event, keeping Tkinter calls on the
main thread
trans/noScribe.*.yml
- Added speaker_naming_title, speaker_naming_hint,
speaker_name_placeholder, speaker_new_badge, speaker_save_checkbox,
btn_ok and btn_skip keys for all 9 supported languages
https://claude.ai/code/session_016natySHkUNa6oDH7sEPf4F
Adds a languages.yml file in the noScribe config directory so users can comment out the transcription languages they never use, shortening the dropdown to just the ones that matter to them. How it works ------------ - On first run noScribe creates ~/.config/noScribe/languages.yml (or the OS-equivalent path) listing all supported languages with an explanatory header in English. - The file is a plain YAML list; to hide a language the user adds '#' at the start of that line. Standard YAML comment syntax means the lines can be uncommented just as easily. - noScribe reads the file at startup but NEVER writes to it, so comments and formatting are always preserved across sessions. - If the file is missing, unreadable, or yields an empty list, noScribe silently falls back to the full built-in language list (no regression for existing users). - 'Auto' is always kept in the active list even if accidentally commented out, to prevent the dropdown from breaking. https://claude.ai/code/session_016natySHkUNa6oDH7sEPf4F
The language filter file now lives in the same folder as noScribe.py so users can find and edit it directly without hunting through OS config paths. The shipped languages.yml has English, Spanish and Portuguese active by default (Auto and Multilingual included); everything else is commented out and can be re-enabled by removing the '#'. https://claude.ai/code/session_016natySHkUNa6oDH7sEPf4F
Two issues fixed:
1. Dialog never appeared
Gated on `if _embeddings:` so it was silently skipped whenever
embedding extraction failed. Now it always shows after diarization
so the user can assign names even without stored signatures.
2. Embedding extraction too fragile
- Tries pipeline._embedding, embedding_, _embedding_model in sequence
- Falls back to loading the embedding model directly from
pyannote/embedding/pytorch_model.bin when none of the pipeline
attributes exist (covers pyannote versions that changed internals)
- Each step now logs to the debug log so failures are visible in
the noScribe log file instead of being silently swallowed
https://claude.ai/code/session_016natySHkUNa6oDH7sEPf4F
|
There are some interesting ideas in here. But it would be good to discuss such ideas before letting Claude Code loose and making such massive changes to the codebase. I have to check this all and decide what I want to keep and what not. As an example, I don't think that it is a good idea to show a modal dialog every time the speaker detection is finished. People are letting noScribe run for hours unattended. When they come back, they expect the transcript to be ready, or even a whole queue of jobs. A modal dialog interrupting this can be quite annoying. |
|
hey Kaixxx, my bad this was menat for my own fork, i´ve been developing new improvements on your code base bringing some of my own previous development in the subject, Agentic coding has been very usefull. |
No description provided.