Skip to content

v0.3.0 — Speaker Identification

Choose a tag to compare

@roboalchemist roboalchemist released this 29 Mar 06:48
· 14 commits to main since this release

Speaker Identification via WeSpeaker ResNet293

New feature: named speaker identification across recordings using persistent voice profiles.

New: any2md speaker subcommand

any2md speaker add "Joe" --audio joe-sample.wav
any2md speaker list
any2md speaker remove "Joe"
any2md speaker merge "Speaker A" "Speaker B"
any2md speaker stats "Joe"
any2md speaker gallery "Joe"

New: --identify flag

any2md meeting.m4a --diarize --identify
# Output: **Joe** [00:24] instead of SPEAKER_0

How it works

  • WeSpeaker ResNet293 extracts 256-d speaker embeddings (PyTorch MPS on Apple Silicon)
  • Gallery model: stores multiple embeddings per speaker to handle voice variation across mics/conditions
  • sqlite-vec for fast KNN nearest-neighbor search
  • Adaptive thresholds: high-confidence auto-match (≥0.85), medium-confidence with score (0.70-0.85)
  • Auto-enrolls new embeddings for matched speakers (gallery grows over time)
  • Prompts for unknown speakers (or --auto-enroll / --no-enroll)

Speaker catalog

Persistent at ~/.config/any2md/speakers.db:

  • Gallery maintenance: rolling window of 20 embeddings per speaker
  • Per-speaker distance statistics for drift detection
  • Merge support for duplicate profiles
  • Audit trail for profile merges

Install

brew upgrade any2md
# Install speaker identification deps
uv pip install any2md[speaker]

Stats

  • 201 new tests (119 speaker + 42 CLI + 40 yt)
  • 7 tickets implemented (ANY2-11 through ANY2-17)