Skip to content

Releases: roboalchemist/any2md

v0.3.3

29 Mar 21:49

Choose a tag to compare

Changes

  • Speaker attendance filter (--speakers): Limit which enrolled speakers are matched during identification
    • any2md podcast.mp3 --diarize --identify --speakers "Alice,Bob" — constrain search space
    • Unknown/guest speakers reliably detected when not in the expected set
  • Speaker groups: Reusable named speaker sets
    • any2md speaker group create "Hosts" --members "Alice,Bob"
    • --speakers "@Hosts" expands group to member list
    • Full CRUD: create, list, show, delete, add-member, remove-member
  • JSON attendance field: --json output includes attendance.expected, attendance.identified, attendance.unknown

v0.3.2

29 Mar 19:53

Choose a tag to compare

Changes

  • Fix speaker deps: wespeakerwespeakerruntime (PyPI installable, ONNX-based)
    • Removed torch dependency for speaker identification
    • Updated speaker.py to use wespeakerruntime.Speaker(lang='en') API
    • Full catalog pipeline verified: enroll, match, unknown rejection

v0.3.1

29 Mar 07:27

Choose a tag to compare

CLI standards compliance: shell completions, docs command, llms.txt, --silent flag.

v0.3.0 — Speaker Identification

29 Mar 06:48

Choose a tag to compare

Speaker Identification via WeSpeaker ResNet293

New feature: named speaker identification across recordings using persistent voice profiles.

New: any2md speaker subcommand

any2md speaker add "Joe" --audio joe-sample.wav
any2md speaker list
any2md speaker remove "Joe"
any2md speaker merge "Speaker A" "Speaker B"
any2md speaker stats "Joe"
any2md speaker gallery "Joe"

New: --identify flag

any2md meeting.m4a --diarize --identify
# Output: **Joe** [00:24] instead of SPEAKER_0

How it works

  • WeSpeaker ResNet293 extracts 256-d speaker embeddings (PyTorch MPS on Apple Silicon)
  • Gallery model: stores multiple embeddings per speaker to handle voice variation across mics/conditions
  • sqlite-vec for fast KNN nearest-neighbor search
  • Adaptive thresholds: high-confidence auto-match (≥0.85), medium-confidence with score (0.70-0.85)
  • Auto-enrolls new embeddings for matched speakers (gallery grows over time)
  • Prompts for unknown speakers (or --auto-enroll / --no-enroll)

Speaker catalog

Persistent at ~/.config/any2md/speakers.db:

  • Gallery maintenance: rolling window of 20 embeddings per speaker
  • Per-speaker distance statistics for drift detection
  • Merge support for duplicate profiles
  • Audit trail for profile merges

Install

brew upgrade any2md
# Install speaker identification deps
uv pip install any2md[speaker]

Stats

  • 201 new tests (119 speaker + 42 CLI + 40 yt)
  • 7 tickets implemented (ANY2-11 through ANY2-17)

v0.2.3

27 Mar 03:36

Choose a tag to compare

Remove redundant mp3 intermediate conversion

  • Local files (m4a/mp4/wav/etc): Skip mp3 step, convert directly to 16kHz WAV (~16s faster on long files)
  • YouTube: Skip mp3 postprocessor in yt-dlp, keep native format (opus/webm) for direct WAV conversion — also avoids lossy mp3 re-encoding
  • Output filenames: Remove _whisper suffix that was leaking from intermediate files into output markdown filenames

v0.2.2

26 Mar 18:21

Choose a tag to compare

Fix diarization OOM on long audio files

Breaking change: None — drop-in fix.

What changed

  • diarize() now uses model.generate_stream() (chunked streaming) instead of model.generate() (full-file, O(n²) memory)
  • Added _merge_diarization_segments() for cross-chunk speaker segment merging
  • 9 new unit tests + 2 integration tests with 82-min public domain podcast

Before/After

  • Before: 48-min file → [metal::malloc] Attempting to allocate 83404284800 bytes (OOM)
  • After: 48-min file → 12.6s diarization (289 chunks, 712 segments, 4 speakers)

Other fixes

  • Fixed test_cli.py hardcoded version string
  • Updated CLAUDE.md with current line counts

v0.2.1 — repo subcommand

22 Mar 21:21

Choose a tag to compare

New: repo subcommand

Pack an entire git repository into a single markdown file via repomix.

# Repomix JSON straight to stdout (for agents)
any2md repo ./my-project --json

# Markdown with frontmatter
any2md repo ./my-project -o ~/docs

# Auto-detects directories with .git
any2md ./my-project

# Tree-sitter structure only (compressed)
any2md repo . --compress

Requires npm install -g repomix as a system dependency.

Stats

  • 19 subcommands (18 converters + repo)
  • 20 new tests for repo subcommand

v0.2.0 — CLI Standards Upgrade

22 Mar 14:15

Choose a tag to compare

CLI Standards Upgrade

New Features

  • --json / -j output mode for all 16 converters — structured JSON to stdout for agent consumption
  • --fields flag for JSON field selection (dot-notation: frontmatter.rows,content)
  • --version / -V flag
  • --quiet / -q flag to suppress log output
  • NO_COLOR environment variable support
  • Shell completions for all subcommands
  • deps subcommand — shows installed/missing optional dependencies
  • Structured JSON error output to stderr in --json mode

Improvements

  • All log output now goes to stderr; stdout reserved for data
  • Standardized exit codes: 0=success, 1=user error, 2=usage error
  • Improved error messages for missing optional deps (suggest uv pip install commands)
  • Help text footer with bug report URL and homepage
  • Added mammoth to doc optional deps (fixes DOCX conversion)

New Files

  • llms.txt — agent-readable capability index
  • Makefile — standard test/lint/install targets
  • WORKLOG.md — development history

Stats

  • 9,500+ lines of source across 18 modules
  • 766 tests

v0.1.0

22 Mar 00:45

Choose a tag to compare

Initial release. 16 converters, 740 tests, local MLX inference on Apple Silicon.