v0.3.0 — Speaker Identification
Speaker Identification via WeSpeaker ResNet293
New feature: named speaker identification across recordings using persistent voice profiles.
New: any2md speaker subcommand
any2md speaker add "Joe" --audio joe-sample.wav
any2md speaker list
any2md speaker remove "Joe"
any2md speaker merge "Speaker A" "Speaker B"
any2md speaker stats "Joe"
any2md speaker gallery "Joe"New: --identify flag
any2md meeting.m4a --diarize --identify
# Output: **Joe** [00:24] instead of SPEAKER_0How it works
- WeSpeaker ResNet293 extracts 256-d speaker embeddings (PyTorch MPS on Apple Silicon)
- Gallery model: stores multiple embeddings per speaker to handle voice variation across mics/conditions
- sqlite-vec for fast KNN nearest-neighbor search
- Adaptive thresholds: high-confidence auto-match (≥0.85), medium-confidence with score (0.70-0.85)
- Auto-enrolls new embeddings for matched speakers (gallery grows over time)
- Prompts for unknown speakers (or
--auto-enroll/--no-enroll)
Speaker catalog
Persistent at ~/.config/any2md/speakers.db:
- Gallery maintenance: rolling window of 20 embeddings per speaker
- Per-speaker distance statistics for drift detection
- Merge support for duplicate profiles
- Audit trail for profile merges
Install
brew upgrade any2md
# Install speaker identification deps
uv pip install any2md[speaker]Stats
- 201 new tests (119 speaker + 42 CLI + 40 yt)
- 7 tickets implemented (ANY2-11 through ANY2-17)