Skip to content

feat(plugins): add Parakeet TDT speech-to-text plugin#281

Merged
streamer45 merged 10 commits intomainfrom
devin/1775827547-parakeet-plugin
Apr 10, 2026
Merged

feat(plugins): add Parakeet TDT speech-to-text plugin#281
streamer45 merged 10 commits intomainfrom
devin/1775827547-parakeet-plugin

Conversation

@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor

@staging-devin-ai-integration staging-devin-ai-integration bot commented Apr 10, 2026

Summary

Adds a new Parakeet TDT native plugin for fast English speech-to-text using NVIDIA's Parakeet TDT 0.6B model via sherpa-onnx. Approximately 10x faster than Whisper on consumer CPU hardware with competitive accuracy.

What's included

  • Full plugin implementation in plugins/native/parakeet/ (FFI bindings, VAD-based segmentation, recognizer caching)
  • Plugin metadata (plugin.yml) with model checksums pointing to streamkit/parakeet-models HF repo
  • Justfile targets: build-plugin-native-parakeet, download-parakeet-models, setup-parakeet, upload-parakeet-plugin
  • Sample oneshot pipeline: samples/pipelines/oneshot/parakeet-stt.yml
  • Plugin docs page + index update (10 → 11 official plugins)
  • Marketplace JSON regenerated

Engine bugfix (separate concern, same PR)

Fixed a pre-existing bug in crates/engine/src/oneshot.rs where the content-type backward walk crashed when reaching synthetic nodes (streamkit::http_input/streamkit::http_output), which aren't in the registry. This affected ALL STT-style oneshot pipelines (parakeet, sensevoice, whisper). The fix skips synthetic nodes in the walk.

End-to-end tested

  • Short audio (4s sample.ogg): correct transcription
  • Long audio (2m speech_2m.opus): 30 natural VAD segments with accurate text

Review & Testing Checklist for Human

  • Verify FFI struct layout in ffi.rs matches sherpa-onnx v1.12.17 C headers (transducer encoder/decoder/joiner field offsets)
  • Run just setup-parakeet && just skit then test with curl -X POST -H "Content-Type: audio/ogg" --data-binary @sample.ogg http://localhost:4545/api/v1/oneshot/parakeet-stt
  • Confirm model downloads work from streamkit/parakeet-models HF repo: just download-parakeet-models
  • Review the oneshot engine fix in oneshot.rs — the synthetic node skip is intentionally conservative (breaks on first synthetic node)
  • Check that download-parakeet-models is intentionally excluded from the download-models umbrella target (models are ~660MB)

Notes

  • The Skit / Lint CI failure is a pre-existing cargo deny wasmtime vulnerability, unrelated to this PR
  • Models are hosted at streamkit/parakeet-models on HuggingFace (CC-BY-4.0 licensed)
  • VAD module (vad.rs) is duplicated from sensevoice — could be extracted to a shared crate in a follow-up

Link to Devin session: https://staging.itsdev.in/sessions/6d151ddf78024d91ade096e7b8f4d9ce
Requested by: @streamer45


Staging: Open in Devin

Add a new native plugin for fast English speech recognition using NVIDIA's
Parakeet TDT (Token-and-Duration Transducer) 0.6B model via sherpa-onnx.

Parakeet TDT is approximately 10x faster than Whisper on consumer hardware
with competitive accuracy (#1 on HuggingFace ASR leaderboard).

Plugin implementation:
- Offline transducer recognizer (encoder/decoder/joiner) via sherpa-onnx C API
- Silero VAD v6 for streaming speech segmentation
- Recognizer caching keyed on (model_dir, num_threads, execution_provider)
- Configurable VAD threshold, silence duration, and max segment length
- 16kHz mono f32 audio input, transcription output

Justfile additions:
- build-plugin-native-parakeet: build the plugin
- download-parakeet-models: download INT8 quantized model (~660MB)
- setup-parakeet: full setup (sherpa-onnx + models + VAD)
- Added parakeet to copy-plugins-native loop

Includes sample oneshot pipeline (parakeet-stt.yml) and plugin.yml manifest.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

View 5 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 download-models target doesn't include download-parakeet-models

The download-models target at justfile line 822 doesn't include download-parakeet-models. Looking at the pattern, NLLB and pocket-tts are also excluded (NLLB for licensing, pocket-tts for gating). Parakeet models are CC-BY-4.0 (not restrictive), so unlike NLLB they could be included. However, at ~660MB they are significantly larger than most other models. This may be an intentional omission to keep default download size manageable, or it may be an oversight. Worth confirming with the author.

Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. At ~660MB the Parakeet model is substantially larger than most other models, so I've kept it out of the default download-models target (similar to pocket-tts) and added it as an optional suggestion in the output message. Fixed in 9f03df7.

- Add build-plugin-native-parakeet to build-plugins-native target
- Fix plugin.yml repo_id to match actual HuggingFace source repos
  (csukuangfj/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8 for model,
   streamkit/sensevoice-models for silero-vad)
- Regenerate marketplace/official-plugins.json with parakeet entry
- Add download-parakeet-models as optional in download-models output
  (skipped by default due to ~660MB size, similar to pocket-tts)

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
staging-devin-ai-integration[bot]

This comment was marked as resolved.

- Add plugin docs page (plugin-native-parakeet.md) with parameters,
  example pipeline, and JSON schema
- Update plugin index to include parakeet (10 → 11 official plugins)
- Fix model download: individual files from HuggingFace instead of
  non-existent tar.bz2 archive
- Add per-file sha256 checksums via file_checksums field (matching
  ModelSpec struct) for integrity verification
- Fix expected_size_bytes to actual total (661190513)
- Regenerate marketplace/official-plugins.json

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 7 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 PORTABILITY_REVIEW.md referenced in AGENTS.md but deleted from repo

The AGENTS.md plugin checklist requires updating marketplace/PORTABILITY_REVIEW.md, but this file was deleted in a prior commit (9548c98) and doesn't exist on main. Since it's a stale reference in AGENTS.md rather than a file this PR should have modified, this isn't flagged as a bug against this PR, but the AGENTS.md instruction should be updated or the file recreated.

Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — as noted, marketplace/PORTABILITY_REVIEW.md doesn't exist in the repo yet and no other plugin has created it either. This is aspirational documentation from the AGENTS.md checklist. Not actionable for this PR.

streamkit-devin and others added 2 commits April 10, 2026 14:36
The content-type backward walk in run_oneshot_pipeline walks backwards
through the pipeline graph to find a node that declares a content_type.
When no node in the chain returns a content_type (e.g. STT pipelines
ending in json_serialize), the walk reaches streamkit::http_input which
is a synthetic node not in the registry, causing a 500 error.

Skip synthetic oneshot nodes (http_input/http_output) in the backward
walk since they are handled separately by the engine and are not
registered in the node registry.

This fixes all STT-style oneshot pipelines (parakeet-stt, sensevoice-stt,
speech_to_text, etc.) that use json_serialize → http_output.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
staging-devin-ai-integration[bot]

This comment was marked as resolved.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 9 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

Comment on lines +139 to +163
/// Field order must match sherpa-onnx v1.12.17 c-api.h exactly.
#[repr(C)]
pub struct SherpaOnnxOfflineModelConfig {
pub transducer: SherpaOnnxOfflineTransducerModelConfig,
pub paraformer: SherpaOnnxOfflineParaformerModelConfig,
pub nemo_ctc: SherpaOnnxOfflineNemoEncDecCtcModelConfig,
pub whisper: SherpaOnnxOfflineWhisperModelConfig,
pub tdnn: SherpaOnnxOfflineTdnnModelConfig,
pub tokens: *const c_char,
pub num_threads: c_int,
pub debug: c_int,
pub provider: *const c_char,
pub model_type: *const c_char,
pub modeling_unit: *const c_char,
pub bpe_vocab: *const c_char,
pub telespeech_ctc: *const c_char,
pub sense_voice: SherpaOnnxOfflineSenseVoiceModelConfig,
pub moonshine: SherpaOnnxOfflineMoonshineModelConfig,
pub fire_red_asr: SherpaOnnxOfflineFireRedAsrModelConfig,
pub dolphin: SherpaOnnxOfflineDolphinModelConfig,
pub zipformer_ctc: SherpaOnnxOfflineZipformerCtcModelConfig,
pub canary: SherpaOnnxOfflineCanaryModelConfig,
pub wenet_ctc: SherpaOnnxOfflineWenetCtcModelConfig,
pub omnilingual: SherpaOnnxOfflineOmnilingualAsrCtcModelConfig,
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 FFI struct layout must exactly match sherpa-onnx c-api.h for the pinned version

The FFI bindings in ffi.rs declare #[repr(C)] structs whose field order must match sherpa-onnx v1.12.17's c-api.h exactly. Any mismatch (added/removed/reordered fields in a newer sherpa-onnx version) would cause silent memory corruption at the ABI boundary. The code pins to a specific version in comments but the actual sherpa-onnx version is determined by whatever libsherpa-onnx-c-api.so is installed via just install-sherpa-onnx. If that installation script is updated to a newer sherpa-onnx version, these bindings may need updating. This matches the approach in other sherpa-onnx-based plugins in the repo.

Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is the same approach used by the sensevoice plugin. The struct layout has been verified against sherpa-onnx v1.12.17's c-api.h and tested end-to-end (recognizer creation + transcription both succeed). The version coupling to just install-sherpa-onnx is noted in the PR review checklist for the human reviewer.

Update the parakeet plugin.yml to point to the controlled
streamkit/parakeet-models HuggingFace repo instead of the external
csukuangfj repo. Regenerate marketplace metadata accordingly.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 0 new potential issues.

View 9 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

Point the justfile download target and README references to
streamkit/parakeet-models instead of the external csukuangfj repo.
Original export attribution preserved in README.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 11 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

};

// Calculate silence threshold in frames (each frame is 32ms)
let silence_threshold_frames = (config.min_silence_duration_ms / 32) as usize;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Integer division truncates silence threshold, making it shorter than configured

The silence threshold in frames is computed via config.min_silence_duration_ms / 32, which uses integer (floor) division. For the default min_silence_duration_ms = 700, the result is 700 / 32 = 21 frames × 32ms = 672ms — 28ms shorter than configured. For the minimum allowed value of 100ms, the result is 100 / 32 = 3 frames = 96ms. The truncation error can be up to 31ms, causing the plugin to trigger transcription slightly earlier than the user specified. Ceiling division should be used to ensure the actual silence duration is at least the configured value.

Suggested change
let silence_threshold_frames = (config.min_silence_duration_ms / 32) as usize;
let silence_threshold_frames = ((config.min_silence_duration_ms + 31) / 32) as usize;
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

- Point silero-vad repo_id to streamkit/parakeet-models instead of
  streamkit/sensevoice-models to avoid cross-plugin dependency
- Remove unused cc build-dependency
- Remove unused once_cell dependency (code uses std::sync::LazyLock)
- Fix misleading update_params comment that claimed VAD params could
  be updated at runtime
- Remove const from set_threshold (f32::clamp is not const-stable)

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 0 new potential issues.

View 11 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

@streamer45 streamer45 enabled auto-merge (squash) April 10, 2026 19:03
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 0 new potential issues.

View 12 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

@streamer45 streamer45 merged commit 304c0b9 into main Apr 10, 2026
17 checks passed
@streamer45 streamer45 deleted the devin/1775827547-parakeet-plugin branch April 10, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants