Skip to content

Conversation

@asukaminato0721
Copy link

@asukaminato0721 asukaminato0721 commented Jan 28, 2026

RUST_BACKTRACE=1 cargo run -p candle-examples -r --features cuda --example parakeet -- \
                                                                                  --input ../jfk.wav \
                                                                                  --model-id mlx-community/parakeet-tdt-0.6b-v3 --debug

works.

file comes from https://github.com/ggml-org/whisper.cpp/blob/master/samples/jfk.wav

I only find https://github.com/senstella/parakeet-mlx so I port this.

fix #3247

Copilot AI review requested due to automatic review settings January 28, 2026 13:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for the Parakeet ASR (Automatic Speech Recognition) model family to the candle-transformers library. The implementation is ported from the MLX framework and supports three model variants: TDT (Token-and-Duration Transducer), RNNT (RNN-Transducer), and CTC (Connectionist Temporal Classification). The PR includes a complete implementation with audio preprocessing, Conformer encoder architecture, LSTM-based decoder, and various decoding strategies including greedy and beam search.

Changes:

  • Adds comprehensive Parakeet model implementation with support for TDT, RNNT, CTC, and hybrid TDT-CTC variants
  • Implements complete audio preprocessing pipeline with FFT, mel-filterbank, and normalization
  • Includes streaming inference support with rotating cache mechanisms for efficient long-form transcription
  • Adds command-line example with support for chunked audio processing and beam search decoding

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
candle-transformers/src/models/parakeet/mod.rs Module definition exposing public API for Parakeet models
candle-transformers/src/models/parakeet/model.rs Core model implementations for all Parakeet variants with decoding strategies
candle-transformers/src/models/parakeet/rnnt.rs LSTM implementation and prediction/joint network components
candle-transformers/src/models/parakeet/conformer.rs Conformer encoder architecture with self-attention and convolution blocks
candle-transformers/src/models/parakeet/attention.rs Multi-head attention with relative positional encoding
candle-transformers/src/models/parakeet/cache.rs Caching mechanisms for streaming inference
candle-transformers/src/models/parakeet/audio.rs Audio preprocessing including FFT and mel-spectrogram computation
candle-transformers/src/models/parakeet/alignment.rs Token alignment and sentence segmentation logic
candle-transformers/src/models/parakeet/ctc.rs CTC decoder implementation
candle-transformers/src/models/parakeet/tokenizer.rs Simple tokenizer decode function
candle-transformers/src/models/mod.rs Updated to include parakeet module in models list
candle-examples/examples/parakeet/main.rs CLI example for running Parakeet inference
candle-examples/examples/parakeet/README.md Usage documentation for the example

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

layers: Vec<LstmLayer>,
hidden_size: usize,
num_layers: usize,
batch_first: bool,
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch_first field is stored but never used in the implementation. The forward method always assumes batch-first format (batch, time, features) regardless of this field's value. Consider removing this field or implementing the non-batch-first case if needed.

Copilot uses AI. Check for mistakes.
kernel_size: usize,
padding: usize,
sampling_num: usize,
subsampling_conv_chunking_factor: isize,
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subsampling_conv_chunking_factor field is stored but never used in the forward method implementation. Consider removing this field or implementing the intended chunking functionality if needed.

Copilot uses AI. Check for mistakes.
.into_values()
.filter(|h| h.step < length)
.collect();
active_list.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap());
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unwrap() calls on partial_cmp results can panic if scores contain NaN values. Consider using unwrap_or(Ordering::Equal) or adding explicit NaN checks before sorting to prevent potential panics during beam search decoding.

Copilot uses AI. Check for mistakes.
Comment on lines +529 to +538
let best = all
.into_iter()
.max_by(|a, b| {
let score_a =
a.score / (a.hypothesis.len().max(1) as f64).powf(beam.length_penalty);
let score_b =
b.score / (b.hypothesis.len().max(1) as f64).powf(beam.length_penalty);
score_a.partial_cmp(&score_b).unwrap()
})
.unwrap();
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unwrap() calls on partial_cmp results can panic if scores contain NaN values. Consider using unwrap_or(Ordering::Equal) or adding explicit NaN checks to prevent potential panics when selecting the best hypothesis.

Copilot uses AI. Check for mistakes.
Comment on lines +421 to +432
token_idx.sort_by(|&a, &b| {
token_logprobs[b].partial_cmp(&token_logprobs[a]).unwrap()
});
token_idx.truncate(beam_token);

let mut dur_idx: Vec<usize> = (0..duration_logprobs.len()).collect();
dur_idx.sort_by(|&a, &b| {
duration_logprobs[b]
.partial_cmp(&duration_logprobs[a])
.unwrap()
});
dur_idx.truncate(beam_duration);
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unwrap() calls on partial_cmp results can panic if log probabilities contain NaN values. Consider using unwrap_or(Ordering::Equal) to handle potential NaN cases gracefully during beam search sorting.

Copilot uses AI. Check for mistakes.
pub values: Option<Tensor>,
pub conv: Option<Tensor>,
pub offset: usize,
step: usize,
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The step field is declared and initialized but never used in the implementation. Consider removing it or implementing the intended functionality if it was meant to serve a purpose.

Copilot uses AI. Check for mistakes.
@asukaminato0721 asukaminato0721 changed the title add parakeet model add parakeet v3 model Jan 28, 2026
@danielclough
Copy link
Contributor

Is this well tested?
I also have work on parakeet I can publish to compare notes.
I couldn't get everything perfect across all models, so I didn't submit a PR.

@asukaminato0721 asukaminato0721 marked this pull request as draft January 28, 2026 19:59
@danielclough
Copy link
Contributor

Looking at my notes I see the differences I was getting between pytorch and candle are things like:

# rnnt1b
pytorch: Good morning everyone!
candle: Good morning, everyone. 

pytorch: The art of communication
candle: the art of communication

Have you done extensive testing to confirm you do not have such issues across all models on a variety of audio files?
Getting an 80-90% WER was pretty easy, but hammering out the final details has been very tricky.

The other reason I wasn't sure if @ivarflakstad would want my code in candle is that I have features for using nemo files directly instead of requiring safetensors from community repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parakeet V3 support?

2 participants