MLX-native stem separation for Apple Silicon Macs.
This project ports the inference paths from audio-separator (upstream repo: nomadkaraoke/python-audio-separator) to MLX so separation runs on Apple Silicon without requiring PyTorch or ONNX Runtime at inference time. Core runtime components are powered by mlx-audio-io (audio I/O) and mlx-spectro (spectral transforms).
- macOS 13+ (Ventura or later)
- Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
pip install mlx-audio-separatorIf you need first-run conversion from upstream checkpoints (.ckpt/.onnx/Demucs weights), install conversion extras:
pip install "mlx-audio-separator[convert]"# Separate with default model
mlx-audio-separator song.mp3
# Use a specific model
mlx-audio-separator song.mp3 -m htdemucs_ft.yaml
# List supported models
mlx-audio-separator --list_modelsfrom mlx_audio_separator import Separator
sep = Separator()
sep.load_model()
outputs = sep.separate("song.mp3")
print(outputs)- Roformer (BS-Roformer and MelBand-Roformer families)
- MDXC (including MDX23C-style checkpoints)
- MDX
- VR
- Demucs
Release validation snapshot (2026-02-24 to 2026-02-26):
| Check | Result |
|---|---|
| Full-catalog benchmark gate | 163/163 models ok (0 failures) |
| Unit tests | 167 passed, 1 skipped |
MLX vs audio-separator parity smoke |
4/4 models passed (rel L2 <= 5e-2) |
Scope: Apple Silicon (M4 mini), MUSDB18-HQ test subset, release gate + parity smoke model set.
Detailed evidence and provenance: docs/release-validation.md.
MLX vs audio-separator (ABBA, 12-song MUSDB18-HQ test subset, M4 mini):
| Model | MLX speedup vs PAS |
|---|---|
htdemucs_ft.yaml |
1.40x |
model_bs_roformer_ep_317_sdr_12.9755.ckpt |
2.16x |
mel_band_roformer_instrumental_instv7n_gabox.ckpt |
2.50x |
UVR-MDX-NET-Inst_HQ_3.onnx |
1.53x |
Median speedup across the 4-model overlap set: 1.847x.
These numbers are scoped to the benchmark settings above and are not universal guarantees for all machines, models, or audio inputs.
Release-facing stable controls:
--speed_mode {default,latency_safe,latency_safe_v2,latency_safe_v3}--cache_clear_policy {aggressive,deferred}--write_workers <int>
Example:
mlx-audio-separator song.mp3 \
--speed_mode latency_safe \
--cache_clear_policy deferred \
--write_workers 2Basic benchmark command:
mlx-audio-separator \
--benchmark song.mp3 \
--benchmark_warmup 1 \
--benchmark_repeats 3 \
--benchmark_profileFor BS-Roformer-SW.ckpt, use the opt-in no-drift FLAC profile:
mlx-audio-separator song.mp3 \
-m BS-Roformer-SW.ckpt \
--output_format FLAC \
--speed_mode latency_safe_v3latency_safe_v3 keeps model inference behavior conservative and focuses on
safe end-to-end latency wins (deferred cache clearing + async stem writes).
To avoid repeated checkpoint conversion overhead, pre-convert once to
*.safetensors and exit:
mlx-audio-separator \
-m BS-Roformer-SW.ckpt \
--save_converted_safetensors \
--preconvert_onlysafetensors primarily improves model load/startup time. It is not expected to
materially change per-file inference latency.
Validation command (latency + deterministic equivalence):
uv run --with torch python scripts/perf/compare_latency.py \
--corpus-file /tmp/corpus_one.txt \
--baseline-config scripts/perf/configs/bs_roformer_sw_default_baseline.json \
--candidate-config scripts/perf/configs/bs_roformer_sw_latency_safe_v3_candidate.json \
--model-file-dir /tmp/audio-separator-models \
--allow-speed-mode-mismatch \
--target-improvement-demucs-mdxc 10.0 \
--equivalence-check \
--equivalence-threshold-rel-l2 1e-6 \
--output-json /tmp/bs_roformer_sw_latency_safe_v3_compare.json \
--output-markdown /tmp/bs_roformer_sw_latency_safe_v3_compare.mdAs of March 4, 2026, latency_safe_v3 remains the only promoted runtime win
for BS-Roformer-SW.ckpt; all experimental tracks below are parked pending new
evidence.
Candidate configs for staged exploration live under scripts/perf/configs/:
bs_roformer_sw_cand_grouped_bandmask.jsonbs_roformer_sw_cand_fused_ola.jsonbs_roformer_sw_cand_stream_pipeline.jsonbs_roformer_sw_cand_compile_fullgraph.jsonbs_roformer_sw_cand_flac_fastwrite.json
Corpus manifest templates:
- Quick gate (3 files):
scripts/perf/corpora/bs_roformer_sw_quick.txt - Full gate (12 files):
scripts/perf/corpora/bs_roformer_sw_full.txt
Quick-gate example:
uv run --with torch python scripts/perf/compare_latency.py \
--corpus-file scripts/perf/corpora/bs_roformer_sw_quick.txt \
--baseline-config scripts/perf/configs/bs_roformer_sw_latency_safe_v3_baseline.json \
--candidate-config scripts/perf/configs/bs_roformer_sw_cand_grouped_bandmask.json \
--model-file-dir /tmp/audio-separator-models \
--target-improvement-demucs-mdxc 3.0 \
--equivalence-check \
--equivalence-threshold-rel-l2 1e-6 \
--equivalence-max-files 1 \
--output-json /tmp/bs_roformer_sw_quick_gate.json \
--output-markdown /tmp/bs_roformer_sw_quick_gate.mdFull-gate example:
uv run --with torch python scripts/perf/compare_latency.py \
--corpus-file scripts/perf/corpora/bs_roformer_sw_full.txt \
--baseline-config scripts/perf/configs/bs_roformer_sw_latency_safe_v3_baseline.json \
--candidate-config scripts/perf/configs/bs_roformer_sw_cand_grouped_bandmask.json \
--model-file-dir /tmp/audio-separator-models \
--target-improvement-demucs-mdxc 5.0 \
--equivalence-check \
--equivalence-threshold-rel-l2 1e-6 \
--equivalence-max-files 0 \
--output-json /tmp/bs_roformer_sw_full_gate.json \
--output-markdown /tmp/bs_roformer_sw_full_gate.md| Document | Description |
|---|---|
docs/release-validation.md |
Release evidence snapshot |
docs/release-first.md |
Release execution playbook |
docs/reproducibility.md |
Reproducibility guide |
docs/wave4-opt-in.md |
Wave 4 opt-in/experimental roadmap |
docs/bs-roformer-sw-optimization-program.md |
BS-Roformer-SW candidate gating and promotion table |
CHANGELOG.md |
Changelog |
THIRD_PARTY_NOTICES.md |
Third-party attribution and license notices |
This project is MIT licensed.
mlx-audio-separator is derived from audio-separator (upstream repo: nomadkaraoke/python-audio-separator, MIT) by beveradb and the nomadkaraoke community. Substantial portions of the architecture, model loading, and separation logic are adapted from that project. If you find this package useful, please also star and support the upstream project.
The models used by this project were trained by the Ultimate Vocal Remover community, primarily @Anjok07 and @aufr33. See THIRD_PARTY_NOTICES.md for full attribution and license details.
Additional references: