Releases · roboalchemist/any2md

29 Mar 21:49

roboalchemist

v0.3.3

c82ef72

v0.3.3 Latest

Latest

Changes

Speaker attendance filter (--speakers): Limit which enrolled speakers are matched during identification
- any2md podcast.mp3 --diarize --identify --speakers "Alice,Bob" — constrain search space
- Unknown/guest speakers reliably detected when not in the expected set
Speaker groups: Reusable named speaker sets
- any2md speaker group create "Hosts" --members "Alice,Bob"
- --speakers "@Hosts" expands group to member list
- Full CRUD: create, list, show, delete, add-member, remove-member
JSON attendance field: --json output includes attendance.expected, attendance.identified, attendance.unknown

Assets 2

29 Mar 19:53

roboalchemist

v0.3.2

56844eb

v0.3.2

Changes

Fix speaker deps: wespeaker → wespeakerruntime (PyPI installable, ONNX-based)
- Removed torch dependency for speaker identification
- Updated speaker.py to use wespeakerruntime.Speaker(lang='en') API
- Full catalog pipeline verified: enroll, match, unknown rejection

Assets 2

29 Mar 07:27

roboalchemist

v0.3.1

31073fe

v0.3.1

CLI standards compliance: shell completions, docs command, llms.txt, --silent flag.

Assets 2

29 Mar 06:48

roboalchemist

v0.3.0

43b7e98

v0.3.0 — Speaker Identification

Speaker Identification via WeSpeaker ResNet293

New feature: named speaker identification across recordings using persistent voice profiles.

New: `any2md speaker` subcommand

any2md speaker add "Joe" --audio joe-sample.wav
any2md speaker list
any2md speaker remove "Joe"
any2md speaker merge "Speaker A" "Speaker B"
any2md speaker stats "Joe"
any2md speaker gallery "Joe"

New: `--identify` flag

any2md meeting.m4a --diarize --identify
# Output: **Joe** [00:24] instead of SPEAKER_0

How it works

WeSpeaker ResNet293 extracts 256-d speaker embeddings (PyTorch MPS on Apple Silicon)
Gallery model: stores multiple embeddings per speaker to handle voice variation across mics/conditions
sqlite-vec for fast KNN nearest-neighbor search
Adaptive thresholds: high-confidence auto-match (≥0.85), medium-confidence with score (0.70-0.85)
Auto-enrolls new embeddings for matched speakers (gallery grows over time)
Prompts for unknown speakers (or --auto-enroll / --no-enroll)

Speaker catalog

Persistent at ~/.config/any2md/speakers.db:

Gallery maintenance: rolling window of 20 embeddings per speaker
Per-speaker distance statistics for drift detection
Merge support for duplicate profiles
Audit trail for profile merges

Install

brew upgrade any2md
# Install speaker identification deps
uv pip install any2md[speaker]

Stats

201 new tests (119 speaker + 42 CLI + 40 yt)
7 tickets implemented (ANY2-11 through ANY2-17)

Assets 2

27 Mar 03:36

roboalchemist

v0.2.3

0de1367

v0.2.3

Remove redundant mp3 intermediate conversion

Local files (m4a/mp4/wav/etc): Skip mp3 step, convert directly to 16kHz WAV (~16s faster on long files)
YouTube: Skip mp3 postprocessor in yt-dlp, keep native format (opus/webm) for direct WAV conversion — also avoids lossy mp3 re-encoding
Output filenames: Remove _whisper suffix that was leaking from intermediate files into output markdown filenames

Assets 2

26 Mar 18:21

roboalchemist

v0.2.2

d2d5cf2

v0.2.2

Fix diarization OOM on long audio files

Breaking change: None — drop-in fix.

What changed

diarize() now uses model.generate_stream() (chunked streaming) instead of model.generate() (full-file, O(n²) memory)
Added _merge_diarization_segments() for cross-chunk speaker segment merging
9 new unit tests + 2 integration tests with 82-min public domain podcast

Before/After

Before: 48-min file → [metal::malloc] Attempting to allocate 83404284800 bytes (OOM)
After: 48-min file → 12.6s diarization (289 chunks, 712 segments, 4 speakers)

Other fixes

Fixed test_cli.py hardcoded version string
Updated CLAUDE.md with current line counts

Assets 2

22 Mar 21:21

roboalchemist

v0.2.1

4975e12

v0.2.1 — repo subcommand

New: `repo` subcommand

Pack an entire git repository into a single markdown file via repomix.

# Repomix JSON straight to stdout (for agents)
any2md repo ./my-project --json

# Markdown with frontmatter
any2md repo ./my-project -o ~/docs

# Auto-detects directories with .git
any2md ./my-project

# Tree-sitter structure only (compressed)
any2md repo . --compress

Requires npm install -g repomix as a system dependency.

Stats

19 subcommands (18 converters + repo)
20 new tests for repo subcommand

Assets 2

22 Mar 14:15

roboalchemist

v0.2.0

098fb26

v0.2.0 — CLI Standards Upgrade

CLI Standards Upgrade

New Features

--json / -j output mode for all 16 converters — structured JSON to stdout for agent consumption
--fields flag for JSON field selection (dot-notation: frontmatter.rows,content)
--version / -V flag
--quiet / -q flag to suppress log output
NO_COLOR environment variable support
Shell completions for all subcommands
deps subcommand — shows installed/missing optional dependencies
Structured JSON error output to stderr in --json mode

Improvements

All log output now goes to stderr; stdout reserved for data
Standardized exit codes: 0=success, 1=user error, 2=usage error
Improved error messages for missing optional deps (suggest uv pip install commands)
Help text footer with bug report URL and homepage
Added mammoth to doc optional deps (fixes DOCX conversion)

New Files

llms.txt — agent-readable capability index
Makefile — standard test/lint/install targets
WORKLOG.md — development history

Stats

9,500+ lines of source across 18 modules
766 tests

Assets 2

22 Mar 00:45

roboalchemist

v0.1.0

e119fce

v0.1.0

Initial release. 16 converters, 740 tests, local MLX inference on Apple Silicon.

Assets 2

Releases: roboalchemist/any2md

v0.3.3

Changes

Uh oh!

v0.3.2

Changes

Uh oh!

v0.3.1

Uh oh!

v0.3.0 — Speaker Identification

Speaker Identification via WeSpeaker ResNet293

New: any2md speaker subcommand

New: --identify flag

How it works

Speaker catalog

Install

Stats

Uh oh!

v0.2.3

Remove redundant mp3 intermediate conversion

Uh oh!

v0.2.2

Fix diarization OOM on long audio files

What changed

Before/After

Other fixes

Uh oh!

v0.2.1 — repo subcommand

New: repo subcommand

Stats

Uh oh!

v0.2.0 — CLI Standards Upgrade

CLI Standards Upgrade

New Features

Improvements

New Files

Stats

Uh oh!

v0.1.0

Uh oh!

New: `any2md speaker` subcommand

New: `--identify` flag

New: `repo` subcommand