-
Notifications
You must be signed in to change notification settings - Fork 1.4k
add parakeet v3 model #3341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add parakeet v3 model #3341
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for the Parakeet ASR (Automatic Speech Recognition) model family to the candle-transformers library. The implementation is ported from the MLX framework and supports three model variants: TDT (Token-and-Duration Transducer), RNNT (RNN-Transducer), and CTC (Connectionist Temporal Classification). The PR includes a complete implementation with audio preprocessing, Conformer encoder architecture, LSTM-based decoder, and various decoding strategies including greedy and beam search.
Changes:
- Adds comprehensive Parakeet model implementation with support for TDT, RNNT, CTC, and hybrid TDT-CTC variants
- Implements complete audio preprocessing pipeline with FFT, mel-filterbank, and normalization
- Includes streaming inference support with rotating cache mechanisms for efficient long-form transcription
- Adds command-line example with support for chunked audio processing and beam search decoding
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| candle-transformers/src/models/parakeet/mod.rs | Module definition exposing public API for Parakeet models |
| candle-transformers/src/models/parakeet/model.rs | Core model implementations for all Parakeet variants with decoding strategies |
| candle-transformers/src/models/parakeet/rnnt.rs | LSTM implementation and prediction/joint network components |
| candle-transformers/src/models/parakeet/conformer.rs | Conformer encoder architecture with self-attention and convolution blocks |
| candle-transformers/src/models/parakeet/attention.rs | Multi-head attention with relative positional encoding |
| candle-transformers/src/models/parakeet/cache.rs | Caching mechanisms for streaming inference |
| candle-transformers/src/models/parakeet/audio.rs | Audio preprocessing including FFT and mel-spectrogram computation |
| candle-transformers/src/models/parakeet/alignment.rs | Token alignment and sentence segmentation logic |
| candle-transformers/src/models/parakeet/ctc.rs | CTC decoder implementation |
| candle-transformers/src/models/parakeet/tokenizer.rs | Simple tokenizer decode function |
| candle-transformers/src/models/mod.rs | Updated to include parakeet module in models list |
| candle-examples/examples/parakeet/main.rs | CLI example for running Parakeet inference |
| candle-examples/examples/parakeet/README.md | Usage documentation for the example |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| layers: Vec<LstmLayer>, | ||
| hidden_size: usize, | ||
| num_layers: usize, | ||
| batch_first: bool, |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batch_first field is stored but never used in the implementation. The forward method always assumes batch-first format (batch, time, features) regardless of this field's value. Consider removing this field or implementing the non-batch-first case if needed.
| kernel_size: usize, | ||
| padding: usize, | ||
| sampling_num: usize, | ||
| subsampling_conv_chunking_factor: isize, |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subsampling_conv_chunking_factor field is stored but never used in the forward method implementation. Consider removing this field or implementing the intended chunking functionality if needed.
| .into_values() | ||
| .filter(|h| h.step < length) | ||
| .collect(); | ||
| active_list.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap()); |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unwrap() calls on partial_cmp results can panic if scores contain NaN values. Consider using unwrap_or(Ordering::Equal) or adding explicit NaN checks before sorting to prevent potential panics during beam search decoding.
| let best = all | ||
| .into_iter() | ||
| .max_by(|a, b| { | ||
| let score_a = | ||
| a.score / (a.hypothesis.len().max(1) as f64).powf(beam.length_penalty); | ||
| let score_b = | ||
| b.score / (b.hypothesis.len().max(1) as f64).powf(beam.length_penalty); | ||
| score_a.partial_cmp(&score_b).unwrap() | ||
| }) | ||
| .unwrap(); |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unwrap() calls on partial_cmp results can panic if scores contain NaN values. Consider using unwrap_or(Ordering::Equal) or adding explicit NaN checks to prevent potential panics when selecting the best hypothesis.
| token_idx.sort_by(|&a, &b| { | ||
| token_logprobs[b].partial_cmp(&token_logprobs[a]).unwrap() | ||
| }); | ||
| token_idx.truncate(beam_token); | ||
|
|
||
| let mut dur_idx: Vec<usize> = (0..duration_logprobs.len()).collect(); | ||
| dur_idx.sort_by(|&a, &b| { | ||
| duration_logprobs[b] | ||
| .partial_cmp(&duration_logprobs[a]) | ||
| .unwrap() | ||
| }); | ||
| dur_idx.truncate(beam_duration); |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unwrap() calls on partial_cmp results can panic if log probabilities contain NaN values. Consider using unwrap_or(Ordering::Equal) to handle potential NaN cases gracefully during beam search sorting.
| pub values: Option<Tensor>, | ||
| pub conv: Option<Tensor>, | ||
| pub offset: usize, | ||
| step: usize, |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The step field is declared and initialized but never used in the implementation. Consider removing it or implementing the intended functionality if it was meant to serve a purpose.
|
Is this well tested? |
|
Looking at my notes I see the differences I was getting between pytorch and candle are things like: Have you done extensive testing to confirm you do not have such issues across all models on a variety of audio files? The other reason I wasn't sure if @ivarflakstad would want my code in candle is that I have features for using nemo files directly instead of requiring safetensors from community repos. |
works.
file comes from https://github.com/ggml-org/whisper.cpp/blob/master/samples/jfk.wav
I only find https://github.com/senstella/parakeet-mlx so I port this.
fix #3247