Releases: roboalchemist/any2md
Releases · roboalchemist/any2md
v0.3.3
Changes
- Speaker attendance filter (
--speakers): Limit which enrolled speakers are matched during identificationany2md podcast.mp3 --diarize --identify --speakers "Alice,Bob"— constrain search space- Unknown/guest speakers reliably detected when not in the expected set
- Speaker groups: Reusable named speaker sets
any2md speaker group create "Hosts" --members "Alice,Bob"--speakers "@Hosts"expands group to member list- Full CRUD: create, list, show, delete, add-member, remove-member
- JSON attendance field:
--jsonoutput includesattendance.expected,attendance.identified,attendance.unknown
v0.3.2
Changes
- Fix speaker deps:
wespeaker→wespeakerruntime(PyPI installable, ONNX-based)- Removed torch dependency for speaker identification
- Updated speaker.py to use
wespeakerruntime.Speaker(lang='en')API - Full catalog pipeline verified: enroll, match, unknown rejection
v0.3.1
CLI standards compliance: shell completions, docs command, llms.txt, --silent flag.
v0.3.0 — Speaker Identification
Speaker Identification via WeSpeaker ResNet293
New feature: named speaker identification across recordings using persistent voice profiles.
New: any2md speaker subcommand
any2md speaker add "Joe" --audio joe-sample.wav
any2md speaker list
any2md speaker remove "Joe"
any2md speaker merge "Speaker A" "Speaker B"
any2md speaker stats "Joe"
any2md speaker gallery "Joe"New: --identify flag
any2md meeting.m4a --diarize --identify
# Output: **Joe** [00:24] instead of SPEAKER_0How it works
- WeSpeaker ResNet293 extracts 256-d speaker embeddings (PyTorch MPS on Apple Silicon)
- Gallery model: stores multiple embeddings per speaker to handle voice variation across mics/conditions
- sqlite-vec for fast KNN nearest-neighbor search
- Adaptive thresholds: high-confidence auto-match (≥0.85), medium-confidence with score (0.70-0.85)
- Auto-enrolls new embeddings for matched speakers (gallery grows over time)
- Prompts for unknown speakers (or
--auto-enroll/--no-enroll)
Speaker catalog
Persistent at ~/.config/any2md/speakers.db:
- Gallery maintenance: rolling window of 20 embeddings per speaker
- Per-speaker distance statistics for drift detection
- Merge support for duplicate profiles
- Audit trail for profile merges
Install
brew upgrade any2md
# Install speaker identification deps
uv pip install any2md[speaker]Stats
- 201 new tests (119 speaker + 42 CLI + 40 yt)
- 7 tickets implemented (ANY2-11 through ANY2-17)
v0.2.3
Remove redundant mp3 intermediate conversion
- Local files (m4a/mp4/wav/etc): Skip mp3 step, convert directly to 16kHz WAV (~16s faster on long files)
- YouTube: Skip mp3 postprocessor in yt-dlp, keep native format (opus/webm) for direct WAV conversion — also avoids lossy mp3 re-encoding
- Output filenames: Remove
_whispersuffix that was leaking from intermediate files into output markdown filenames
v0.2.2
Fix diarization OOM on long audio files
Breaking change: None — drop-in fix.
What changed
diarize()now usesmodel.generate_stream()(chunked streaming) instead ofmodel.generate()(full-file, O(n²) memory)- Added
_merge_diarization_segments()for cross-chunk speaker segment merging - 9 new unit tests + 2 integration tests with 82-min public domain podcast
Before/After
- Before: 48-min file →
[metal::malloc] Attempting to allocate 83404284800 bytes(OOM) - After: 48-min file → 12.6s diarization (289 chunks, 712 segments, 4 speakers)
Other fixes
- Fixed
test_cli.pyhardcoded version string - Updated CLAUDE.md with current line counts
v0.2.1 — repo subcommand
New: repo subcommand
Pack an entire git repository into a single markdown file via repomix.
# Repomix JSON straight to stdout (for agents)
any2md repo ./my-project --json
# Markdown with frontmatter
any2md repo ./my-project -o ~/docs
# Auto-detects directories with .git
any2md ./my-project
# Tree-sitter structure only (compressed)
any2md repo . --compressRequires npm install -g repomix as a system dependency.
Stats
- 19 subcommands (18 converters + repo)
- 20 new tests for repo subcommand
v0.2.0 — CLI Standards Upgrade
CLI Standards Upgrade
New Features
--json/-joutput mode for all 16 converters — structured JSON to stdout for agent consumption--fieldsflag for JSON field selection (dot-notation:frontmatter.rows,content)--version/-Vflag--quiet/-qflag to suppress log outputNO_COLORenvironment variable support- Shell completions for all subcommands
depssubcommand — shows installed/missing optional dependencies- Structured JSON error output to stderr in
--jsonmode
Improvements
- All log output now goes to stderr; stdout reserved for data
- Standardized exit codes: 0=success, 1=user error, 2=usage error
- Improved error messages for missing optional deps (suggest uv pip install commands)
- Help text footer with bug report URL and homepage
- Added mammoth to doc optional deps (fixes DOCX conversion)
New Files
- llms.txt — agent-readable capability index
- Makefile — standard test/lint/install targets
- WORKLOG.md — development history
Stats
- 9,500+ lines of source across 18 modules
- 766 tests
v0.1.0
Initial release. 16 converters, 740 tests, local MLX inference on Apple Silicon.