mlx-audio-io

mlx-audio-io is the audio data layer for MLX: fast file decode/encode directly to and from mlx.core.array, with one API across macOS and Linux.

Why This Project Exists

MLX has strong tensor and model primitives, but it does not ship a first-class, cross-platform audio file I/O layer comparable to what torchaudio provides in the PyTorch ecosystem.

In practice, MLX users often end up with one of these compromises:

bridge through NumPy/SoundFile/librosa with extra copies and inconsistent format behavior
shell out to ffmpeg/ffprobe for non-WAV workflows
pull in parts of the PyTorch audio stack just to handle common audio containers/codecs

mlx-audio-io closes that gap with a native backend designed for MLX workloads:

direct decode/encode into mlx.core.array
one Python API (load, save, info, stream, batch_load, supports_soxr) on both macOS and Linux
consistent validation and error messages across platforms
support for training/inference data access patterns (partial reads, chunked streaming, optional resampling)

Platform Backends

macOS backend optimized for Apple Silicon via AudioToolbox
Linux backend with native WAV/MP3 fast paths plus libav-backed codec support (FLAC/M4A/AIFF/CAF)

The public Python API is the same on both platforms: load, save, info, stream, batch_load, supports_soxr.

Backend Feature Matrix

Capability	macOS backend	Linux backend
`info(path)`	AudioToolbox-supported formats (WAV, MP3, M4A/AAC, FLAC, AIFF, CAF, etc.)	WAV, MP3, FLAC, M4A/AAC, AIFF, CAF
`load(path)`	AudioToolbox-supported formats + native-rate MP3 fast path	WAV, MP3, FLAC, M4A/AAC, AIFF, CAF
`load(..., sr=...)`	Supported, with AudioToolbox resampling	Supported (`WAV/MP3` native linear path, other supported formats via libav decode/resample)
`save(path, ...)`	WAV, MP3, M4A/AAC, FLAC, AIFF, CAF	WAV, MP3, M4A/AAC, FLAC, AIFF, CAF
`encoding`	`float32`, `pcm16`, `alac` (for `.m4a`)	`float32`, `pcm16`, `alac` (for `.m4a`)
`stream(path, ...)`	AudioToolbox-supported formats + native-rate MP3 path	WAV, MP3, FLAC, M4A/AAC, AIFF, CAF
`stream(..., sr=...)`	Supported	Supported (`WAV/MP3` native linear path, other supported formats via libav-backed chunked decode path)

Unsupported format/encoding combinations fail with explicit ValueError messages.

Installation

End users (PyPI)

For normal use:

pip install mlx-audio-io

Version policy

mlx-audio-io ships one wheel line per exact MLX runtime version. The native extension is built and tested against a single MLX release, and the loader rejects mismatched MLX versions at import time to avoid hard crashes.

For the current release line:

pip install "mlx-audio-io==1.3.10"

This release pins:

macOS: mlx==0.31.0
Linux: mlx[cpu]==0.31.0

If you maintain a downstream MLX library, pin mlx and mlx-audio-io together. Do not publish broad mlx>=... ranges while depending on mlx-audio-io, because the native loader requires an exact MLX match anyway.

Contributors (source checkout)

For local development and tests:

git clone https://github.com/ssmall256/mlx-audio-io.git
cd mlx-audio-io
uv sync --extra dev

Hard Rule: Do Not Copy `.venv` Between Machines

Do not copy project virtual environments across machines. Native extensions can fail integrity/code-sign checks or crash when moved between hosts.

If you already copied one, recreate it:

rm -rf .venv && uv venv --python 3.11 && uv sync

Linux source build behavior

Linux source builds require libav and use direct libav-backed paths:

Linux info() for non-WAV formats uses direct libav metadata.
Linux load() for non-WAV formats uses direct libav decode for all offset/duration combinations.
Linux stream() for non-WAV formats uses direct libav packet/frame decode.
Linux save() for encoded formats (.mp3, .flac, .m4a, .aiff/.aif, .caf) uses direct libav encode/mux.

Wheel portability and `libsoxr`

soxr_hq / soxr_vhq are optional and enabled only when libsoxr is available at build time.
macOS wheel builds are automatically repaired with delocate in this project build backend, so external libsoxr dylibs are bundled into the wheel instead of relying on Homebrew paths.
Linux wheel repair via auditwheel is supported and can be enabled with MLX_AUDIO_IO_REPAIR_LINUX=1 in release CI.
To disable wheel repair explicitly (not recommended for release builds), set MLX_AUDIO_IO_REPAIR_WHEEL=0.
Release CI verifies wheel linkage with tools/check_wheel_linkage.py and fails if absolute host library paths are detected.
Third-party notices for bundled libsoxr are shipped in-package at:
- mlx_audio_io/THIRD_PARTY_NOTICES.md
- mlx_audio_io/licenses/libsoxr/
Current release publish workflows build and publish sdist artifacts; a separate macOS wheel job verifies wheel linkage and notice packaging as a release gate.

Requirements

Python 3.10+
Runtime:
- macOS: Apple Silicon + mlx
- Linux: mlx[cpu] (current default)
Source builds:
- CMake 3.24+, C++17 toolchain, pkg-config
- Linux default build: libavformat-dev, libavcodec-dev, libavutil-dev, libswresample-dev

Linux Troubleshooting

ModuleNotFoundError: mlx_audio_io
- Install in the project environment (uv sync) and run via uv run ....
ImportError for mlx on Linux
- Ensure Linux dependency is installed as mlx[cpu].
Build failures on source installs
- Verify build-essential, cmake, ninja-build, and pkg-config are installed.
Extended Linux format support errors (.mp3, .m4a, .flac, .aiff, .caf)
- For default Linux builds, ensure runtime libav libraries are present (libavformat, libavcodec, libavutil, libswresample).
MP3 test fixture generation failures
- Tests that generate MP3 fixtures require ffmpeg or lame available on PATH.
Native import failures or unexpected crashes
- Run diagnostics: python -m mlx_audio_io.doctor
- Check MLX runtime compatibility: python -c "import mlx_audio_io as aio; print(aio.show_build_info())"
- If build_mlx_version and runtime_mlx_version differ, reinstall with matching deps: pip install -U "mlx==<build_mlx_version>" "mlx-audio-io"
- Avoid pip install --no-deps for mlx-audio-io unless you manually pin a matching mlx version.
- Recreate env (do not copy .venv between machines): rm -rf .venv && uv venv --python 3.11 && uv sync

Quickstart

from mlx_audio_io import load, save, info, stream, batch_load, supports_soxr

# Load
x, sr = load("speech.wav")

# Resample + mono (auto-selects soxr_vhq when available, falls back to "best")
x16, sr16 = load("speech.wav", sr=16000, mono=True)

# Metadata without decoding
meta = info("speech.wav")

# Stream in chunks
for chunk, chunk_sr in stream("long.wav", chunk_duration=2.0):
    pass

# Save WAV
save("out.wav", x, sr)
save("out_pcm16.wav", x, sr, encoding="pcm16")

# Batch load
items = batch_load(["a.wav", "b.wav"], sr=16000, mono=True)

Additional save examples:

save("out.flac", x, sr)
save("out.mp3", x, sr, bitrate="192k")
save("out.m4a", x, sr, bitrate="256k")
save("out.m4a", x, sr, encoding="alac")

API Reference

`load`

load(path, sr=None, offset=0.0, duration=None, mono=False, mono_mode="mean",
     layout="channels_last", dtype="float32", resample_quality="default")

Decode audio into an mlx.core.array. Returns (audio, sample_rate).

Parameter	Default	Description
`path`	—	Path to audio file
`sr`	`None`	Target sample rate; `None` keeps native rate
`offset`	`0.0`	Start position in seconds
`duration`	`None`	Duration in seconds; `None` reads to end
`mono`	`False`	Mix down to mono
`mono_mode`	`"mean"`	Mono fold policy: `"mean"` or `"equal_power"`
`layout`	`"channels_last"`	`"channels_last"` `[frames, ch]` or `"channels_first"` `[ch, frames]`
`dtype`	`"float32"`	`"float32"` or `"float16"`
`resample_quality`	`"default"`	`"default"`, `"fastest"`, `"low"`, `"medium"`, `"high"`, `"best"`, `"soxr_hq"`, `"soxr_vhq"`, `"torchaudio_compat"`

On Linux WAV/MP3 fast paths, resample quality levels currently map to the same linear behavior. soxr_hq/soxr_vhq use true libsoxr resampling (when built with libsoxr). If soxr_hq/soxr_vhq is requested without libsoxr support, load()/resample() raise RuntimeError. torchaudio_compat requires torch + torchaudio and uses torchaudio.functional.resample.

When sr is specified and resample_quality is left at "default", load() automatically selects soxr_vhq when libsoxr is available, falling back to "best" otherwise. You can still override explicitly:

audio, sr = load("speech.wav", sr=16000, resample_quality="soxr_hq")

`batch_load`

batch_load(paths, sr=None, mono=False, mono_mode="mean", dtype="float32", num_workers=4)

Threaded multi-file load(). Returns list[(audio, sample_rate)].

`save`

save(path, audio, sr, layout="channels_last", encoding="float32",
     bitrate="auto", clip=True)

Write audio from mx.array (or numpy.ndarray) to disk.

Parameter	Default	Description
`path`	—	Output file path (format inferred from extension)
`audio`	—	Audio data; 1-D input is treated as mono
`sr`	—	Sample rate
`layout`	`"channels_last"`	Layout of the input array
`encoding`	`"float32"`	`"float32"`, `"pcm16"`, or `"alac"` (for `.m4a`)
`bitrate`	`"auto"`	Bitrate for lossy formats (`.m4a` AAC, `.mp3` on Linux)
`clip`	`True`	Clamp samples to `[-1, 1]` before encoding

`stream`

stream(path, chunk_frames=None, chunk_duration=None, sr=None,
       mono=False, mono_mode="mean", dtype="float32", offset=0.0, duration=None)

Return an iterator yielding (audio_chunk, sample_rate). Exactly one of chunk_frames or chunk_duration is required.

Parameter	Default	Description
`path`	—	Path to audio file
`chunk_frames`	`None`	Chunk size in frames
`chunk_duration`	`None`	Chunk size in seconds
`sr`	`None`	Target sample rate; `None` keeps native rate
`mono`	`False`	Mix down to mono
`mono_mode`	`"mean"`	Mono fold policy: `"mean"` or `"equal_power"`
`dtype`	`"float32"`	`"float32"` or `"float16"`
`offset`	`0.0`	Start position in seconds for windowed stream
`duration`	`None`	Duration in seconds for windowed stream; `None` streams to end

`info`

info(path)

Return AudioInfo metadata without decoding sample buffers.

Field	Description
`frames`	Total number of sample frames
`sample_rate`	Sample rate in Hz
`channels`	Number of channels
`duration`	Duration in seconds
`subtype`	Sample encoding (e.g. `pcm16`, `float32`)
`container`	File format (e.g. `wav`, `mp3`, `m4a`)

`supports_soxr`

supports_soxr()

Return True when the installed native extension was built with libsoxr support. Use this to select resample_quality at runtime.

Testing

Run all tests:

uv sync --extra dev
uv run python -m pytest -q

Run Linux supported subset:

uv run python -m pytest -q -m "not apple_only"

Run Apple-only subset:

uv run python -m pytest -q -m "apple_only"

Linux Docker run from a macOS host:

docker run --rm -it --platform linux/arm64 \
  -v "$PWD":/work -w /work \
  python:3.14-bookworm bash -lc '
    apt-get update && apt-get install -y --no-install-recommends \
      build-essential cmake ninja-build pkg-config ffmpeg \
      libavformat-dev libavcodec-dev libavutil-dev libswresample-dev &&
    python -m pip install -U pip uv &&
    uv sync --extra dev &&
    uv run python -m pytest -q -m "not apple_only"
  '

Performance

Benchmark methodology, commands, and full result tables live in docs/benchmarking.md.

Headline numbers (194.8s stereo PCM16 WAV @ 44.1 kHz, p50 median latency):

Task	macOS M4 Max	Linux arm64
Full WAV load	3.59 ms — 6.9x faster than librosa	8.41 ms — 5.9x faster than librosa
WAV partial read (1 s)	0.04 ms — 3.4x faster than librosa	0.05 ms — 2.6x faster than librosa
WAV save (float32)	6.98 ms — 2.8x faster than soundfile	31.70 ms — 1.8x faster than soundfile
MP3 load (native SR)	63.70 ms — 1.3x faster than librosa	80.93 ms — on par with librosa
M4A/AAC load	56.31 ms — 2.2x faster than librosa	89.63 ms — 1.6x faster than librosa
Load + resample 16 kHz	13.12 ms — 4.4x faster than librosa	10.93 ms — 7.9x faster than librosa

Full tables with torchaudio comparisons, M1 Max, and Linux x86_64 results are in the benchmarking doc.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
docs		docs
python/mlx_audio_io		python/mlx_audio_io
src		src
tests		tests
tools		tools
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build_backend.py		build_backend.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mlx-audio-io

Why This Project Exists

Platform Backends

Backend Feature Matrix

Installation

End users (PyPI)

Version policy

Contributors (source checkout)

Hard Rule: Do Not Copy `.venv` Between Machines

Linux source build behavior

Wheel portability and `libsoxr`

Requirements

Linux Troubleshooting

Quickstart

API Reference

`load`

`batch_load`

`save`

`stream`

`info`

`supports_soxr`

Testing

Performance

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

mlx-audio-io

Why This Project Exists

Platform Backends

Backend Feature Matrix

Installation

End users (PyPI)

Version policy

Contributors (source checkout)

Hard Rule: Do Not Copy .venv Between Machines

Linux source build behavior

Wheel portability and libsoxr

Requirements

Linux Troubleshooting

Quickstart

API Reference

load

batch_load

save

stream

info

supports_soxr

Testing

Performance

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors 1

Languages

Hard Rule: Do Not Copy `.venv` Between Machines

Wheel portability and `libsoxr`

`load`

`batch_load`

`save`

`stream`

`info`

`supports_soxr`

Packages