Canary MLX

NVIDIA Canary ASR model optimized for Apple Silicon using MLX.

Installation

pip install canary-mlx

Usage

Basic Transcription

from canary_mlx import load_model

model = load_model("qfuxa/canary-mlx") #You can also use a local model directory

result = model.transcribe("audio.wav", language="en")
print(result)

With Timestamps

result = model.transcribe("audio.wav", language="en", timestamps=True)

for sentence in result.sentences:
    print(f"[{sentence.start:.2f}s - {sentence.end:.2f}s] {sentence.text}")

Long Audio with Chunking

result = model.transcribe(
    "long_audio.wav",
    language="en",
    timestamps=True,
    chunk_duration=30.0,  # Process in 30-second chunks
    overlap_duration=15.0,  # 15-second overlap between chunks
)

Translation

Translate audio from one language to another (speech-to-text translation):

result = model.translate(
    "french_audio.wav",
    source_language="fr",
    target_language="en"
)
print(result)

Supported Languages

Bulgarian (bg), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Italian (it), Latvian (lv), Lithuanian (lt), Maltese (mt), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Russian (ru), Ukrainian (uk)

Benchmarks

STT FLEURS WER (lower is better)

Model	bg	cs	da	de	el	en	es	et	fi	fr	hr	hu	it	lt	lv	mt	nl	pl	pt	ro	ru	sk	sl	sv	uk
whisper-large-v3	12.86	11.33	12.57	4.30	27.03	4.25	3.12	19.12	7.70	6.31	11.07	14.11	2.31	22.34	18.29	68.89	5.57	4.74	3.65	8.24	4.17	8.40	19.93	7.88	6.51
Canary-1B-v2	9.25	7.86	11.25	4.40	9.21	4.50	2.90	12.55	8.59	5.02	8.29	12.90	3.07	12.36	9.66	18.31	6.12	6.64	4.39	6.61	6.90	5.74	13.32	9.57	10.50

For more detailed metrics, especially regarding translation, please refer to the appendix of the Canary-1B v2 technical report.

Performance (Speed)

Inference Speed on Apple M4

The following benchmarks were measured on an Apple M4 processing a 10-minute audio file.

Canary-1B-v2 (MLX)

Chunk Duration	Time Taken	RAM Consumption	Notes
30s	110.2s	~4.7GB (start) to 5.25GB (end)	Recommended duration
60s	99.0s	-
120s	90.5s	-	Fails at the end (hallucinations/loops)

Important

NVIDIA recommends using a chunk duration of less than 40 seconds for Canary models to avoid transcription failures/hallucinations at the end of chunks.

Comparison: Whisper Large v3 (MLX)

Model	Time Taken	RAM Consumption
Whisper Large v3	77.5s	6.3GB (start) to 10.2GB (end)

API Reference

`load_model(path_or_hf_id, dtype=mx.bfloat16)`

Load a Canary model from a local directory or HuggingFace Hub.

Parameters:

path_or_hf_id: Local path or HuggingFace model ID (e.g., "qfuxa/canary-mlx")
dtype: Data type for model weights (default: mx.bfloat16)

Returns: Canary model instance

`model.transcribe(...)`

Transcribe an audio file (same language in/out).

Parameters:

path: Path to audio file
language: Language code (e.g., "en")
timestamps: Include word-level timestamps (default: False)
punctuation: Include punctuation (default: True)
chunk_duration: Process in chunks of this duration (optional)
overlap_duration: Overlap between chunks in seconds (default: 15.0)

Returns: TranscriptionResult if timestamps=True, else str

`model.translate(...)`

Translate audio from one language to another.

Parameters:

path: Path to audio file
source_language: Language of the audio (e.g., "fr", "de")
target_language: Target language for translation (default: "en")
timestamps: Include word-level timestamps (default: False)
punctuation: Include punctuation (default: True)
chunk_duration: Process in chunks of this duration (optional)
overlap_duration: Overlap between chunks in seconds (default: 15.0)

Returns: TranscriptionResult if timestamps=True, else str

Acknowledgements

Nvidia for the impressive model
MLX project and community
Senstella for Parakeet MLX that has been a great help for the FastConformer mlx implementation

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
canary_mlx		canary_mlx
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
convert_nemo.py		convert_nemo.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Canary MLX

Installation

Usage

Basic Transcription

With Timestamps

Long Audio with Chunking

Translation

Supported Languages

Benchmarks

STT FLEURS WER (lower is better)

Performance (Speed)

Inference Speed on Apple M4

Canary-1B-v2 (MLX)

Comparison: Whisper Large v3 (MLX)

API Reference

`load_model(path_or_hf_id, dtype=mx.bfloat16)`

`model.transcribe(...)`

`model.translate(...)`

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

QuentinFuxa/canary-mlx

Folders and files

Latest commit

History

Repository files navigation

Canary MLX

Installation

Usage

Basic Transcription

With Timestamps

Long Audio with Chunking

Translation

Supported Languages

Benchmarks

STT FLEURS WER (lower is better)

Performance (Speed)

Inference Speed on Apple M4

Canary-1B-v2 (MLX)

Comparison: Whisper Large v3 (MLX)

API Reference

load_model(path_or_hf_id, dtype=mx.bfloat16)

model.transcribe(...)

model.translate(...)

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`load_model(path_or_hf_id, dtype=mx.bfloat16)`

`model.transcribe(...)`

`model.translate(...)`

Packages