feat(plugins): add Parakeet TDT speech-to-text plugin by staging-devin-ai-integration[bot] · Pull Request #281 · streamer45/streamkit

staging-devin-ai-integration · 2026-04-10T13:42:10Z

Summary

Adds a new Parakeet TDT native plugin for fast English speech-to-text using NVIDIA's Parakeet TDT 0.6B model via sherpa-onnx. Approximately 10x faster than Whisper on consumer CPU hardware with competitive accuracy.

What's included

Full plugin implementation in plugins/native/parakeet/ (FFI bindings, VAD-based segmentation, recognizer caching)
Plugin metadata (plugin.yml) with model checksums pointing to streamkit/parakeet-models HF repo
Justfile targets: build-plugin-native-parakeet, download-parakeet-models, setup-parakeet, upload-parakeet-plugin
Sample oneshot pipeline: samples/pipelines/oneshot/parakeet-stt.yml
Plugin docs page + index update (10 → 11 official plugins)
Marketplace JSON regenerated

Engine bugfix (separate concern, same PR)

Fixed a pre-existing bug in crates/engine/src/oneshot.rs where the content-type backward walk crashed when reaching synthetic nodes (streamkit::http_input/streamkit::http_output), which aren't in the registry. This affected ALL STT-style oneshot pipelines (parakeet, sensevoice, whisper). The fix skips synthetic nodes in the walk.

End-to-end tested

Short audio (4s sample.ogg): correct transcription
Long audio (2m speech_2m.opus): 30 natural VAD segments with accurate text

Review & Testing Checklist for Human

Verify FFI struct layout in ffi.rs matches sherpa-onnx v1.12.17 C headers (transducer encoder/decoder/joiner field offsets)
Run just setup-parakeet && just skit then test with curl -X POST -H "Content-Type: audio/ogg" --data-binary @sample.ogg http://localhost:4545/api/v1/oneshot/parakeet-stt
Confirm model downloads work from streamkit/parakeet-models HF repo: just download-parakeet-models
Review the oneshot engine fix in oneshot.rs — the synthetic node skip is intentionally conservative (breaks on first synthetic node)
Check that download-parakeet-models is intentionally excluded from the download-models umbrella target (models are ~660MB)

Notes

The Skit / Lint CI failure is a pre-existing cargo deny wasmtime vulnerability, unrelated to this PR
Models are hosted at streamkit/parakeet-models on HuggingFace (CC-BY-4.0 licensed)
VAD module (vad.rs) is duplicated from sensevoice — could be extracted to a shared crate in a follow-up

Link to Devin session: https://staging.itsdev.in/sessions/6d151ddf78024d91ade096e7b8f4d9ce
Requested by: @streamer45

Add a new native plugin for fast English speech recognition using NVIDIA's Parakeet TDT (Token-and-Duration Transducer) 0.6B model via sherpa-onnx. Parakeet TDT is approximately 10x faster than Whisper on consumer hardware with competitive accuracy (#1 on HuggingFace ASR leaderboard). Plugin implementation: - Offline transducer recognizer (encoder/decoder/joiner) via sherpa-onnx C API - Silero VAD v6 for streaming speech segmentation - Recognizer caching keyed on (model_dir, num_threads, execution_provider) - Configurable VAD threshold, silence duration, and max segment length - 16kHz mono f32 audio input, transcription output Justfile additions: - build-plugin-native-parakeet: build the plugin - download-parakeet-models: download INT8 quantized model (~660MB) - setup-parakeet: full setup (sherpa-onnx + models + VAD) - Added parakeet to copy-plugins-native loop Includes sample oneshot pipeline (parakeet-stt.yml) and plugin.yml manifest. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

staging-devin-ai-integration · 2026-04-10T13:42:13Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

staging-devin-ai-integration

Devin Review found 4 potential issues.

View 5 additional findings in Devin Review.

Debug

Playground

justfile

staging-devin-ai-integration · 2026-04-10T13:46:31Z

justfile

🚩 download-models target doesn't include download-parakeet-models

The download-models target at justfile line 822 doesn't include download-parakeet-models. Looking at the pattern, NLLB and pocket-tts are also excluded (NLLB for licensing, pocket-tts for gating). Parakeet models are CC-BY-4.0 (not restrictive), so unlike NLLB they could be included. However, at ~660MB they are significantly larger than most other models. This may be an intentional omission to keep default download size manageable, or it may be an oversight. Worth confirming with the author.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Good catch. At ~660MB the Parakeet model is substantially larger than most other models, so I've kept it out of the default download-models target (similar to pocket-tts) and added it as an optional suggestion in the output message. Fixed in 9f03df7.

plugins/native/parakeet/plugin.yml

- Add build-plugin-native-parakeet to build-plugins-native target - Fix plugin.yml repo_id to match actual HuggingFace source repos (csukuangfj/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8 for model, streamkit/sensevoice-models for silero-vad) - Regenerate marketplace/official-plugins.json with parakeet entry - Add download-parakeet-models as optional in download-models output (skipped by default due to ~660MB size, similar to pocket-tts) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

- Add plugin docs page (plugin-native-parakeet.md) with parameters, example pipeline, and JSON schema - Update plugin index to include parakeet (10 → 11 official plugins) - Fix model download: individual files from HuggingFace instead of non-existent tar.bz2 archive - Add per-file sha256 checksums via file_checksums field (matching ModelSpec struct) for integrity verification - Fix expected_size_bytes to actual total (661190513) - Regenerate marketplace/official-plugins.json Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

staging-devin-ai-integration

Devin Review found 1 new potential issue.

View 7 additional findings in Devin Review.

Debug

Playground

staging-devin-ai-integration · 2026-04-10T14:03:22Z

plugins/native/parakeet/plugin.yml

🚩 PORTABILITY_REVIEW.md referenced in AGENTS.md but deleted from repo

The AGENTS.md plugin checklist requires updating marketplace/PORTABILITY_REVIEW.md, but this file was deleted in a prior commit (9548c98) and doesn't exist on main. Since it's a stale reference in AGENTS.md rather than a file this PR should have modified, this isn't flagged as a bug against this PR, but the AGENTS.md instruction should be updated or the file recreated.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Acknowledged — as noted, marketplace/PORTABILITY_REVIEW.md doesn't exist in the repo yet and no other plugin has created it either. This is aspirational documentation from the AGENTS.md checklist. Not actionable for this PR.

The content-type backward walk in run_oneshot_pipeline walks backwards through the pipeline graph to find a node that declares a content_type. When no node in the chain returns a content_type (e.g. STT pipelines ending in json_serialize), the walk reaches streamkit::http_input which is a synthetic node not in the registry, causing a 500 error. Skip synthetic oneshot nodes (http_input/http_output) in the backward walk since they are handled separately by the engine and are not registered in the node registry. This fixes all STT-style oneshot pipelines (parakeet-stt, sensevoice-stt, speech_to_text, etc.) that use json_serialize → http_output. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

staging-devin-ai-integration

Devin Review found 1 new potential issue.

View 9 additional findings in Devin Review.

Debug

Playground

staging-devin-ai-integration · 2026-04-10T14:51:56Z

plugins/native/parakeet/src/ffi.rs

+/// Field order must match sherpa-onnx v1.12.17 c-api.h exactly.
+#[repr(C)]
+pub struct SherpaOnnxOfflineModelConfig {
+    pub transducer: SherpaOnnxOfflineTransducerModelConfig,
+    pub paraformer: SherpaOnnxOfflineParaformerModelConfig,
+    pub nemo_ctc: SherpaOnnxOfflineNemoEncDecCtcModelConfig,
+    pub whisper: SherpaOnnxOfflineWhisperModelConfig,
+    pub tdnn: SherpaOnnxOfflineTdnnModelConfig,
+    pub tokens: *const c_char,
+    pub num_threads: c_int,
+    pub debug: c_int,
+    pub provider: *const c_char,
+    pub model_type: *const c_char,
+    pub modeling_unit: *const c_char,
+    pub bpe_vocab: *const c_char,
+    pub telespeech_ctc: *const c_char,
+    pub sense_voice: SherpaOnnxOfflineSenseVoiceModelConfig,
+    pub moonshine: SherpaOnnxOfflineMoonshineModelConfig,
+    pub fire_red_asr: SherpaOnnxOfflineFireRedAsrModelConfig,
+    pub dolphin: SherpaOnnxOfflineDolphinModelConfig,
+    pub zipformer_ctc: SherpaOnnxOfflineZipformerCtcModelConfig,
+    pub canary: SherpaOnnxOfflineCanaryModelConfig,
+    pub wenet_ctc: SherpaOnnxOfflineWenetCtcModelConfig,
+    pub omnilingual: SherpaOnnxOfflineOmnilingualAsrCtcModelConfig,
+}


🚩 FFI struct layout must exactly match sherpa-onnx c-api.h for the pinned version

The FFI bindings in ffi.rs declare #[repr(C)] structs whose field order must match sherpa-onnx v1.12.17's c-api.h exactly. Any mismatch (added/removed/reordered fields in a newer sherpa-onnx version) would cause silent memory corruption at the ABI boundary. The code pins to a specific version in comments but the actual sherpa-onnx version is determined by whatever libsherpa-onnx-c-api.so is installed via just install-sherpa-onnx. If that installation script is updated to a newer sherpa-onnx version, these bindings may need updating. This matches the approach in other sherpa-onnx-based plugins in the repo.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Acknowledged — this is the same approach used by the sensevoice plugin. The struct layout has been verified against sherpa-onnx v1.12.17's c-api.h and tested end-to-end (recognizer creation + transcription both succeed). The version coupling to just install-sherpa-onnx is noted in the PR review checklist for the human reviewer.

Update the parakeet plugin.yml to point to the controlled streamkit/parakeet-models HuggingFace repo instead of the external csukuangfj repo. Regenerate marketplace metadata accordingly. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

staging-devin-ai-integration

Devin Review found 0 new potential issues.

View 9 additional findings in Devin Review.

Debug

Playground

Point the justfile download target and README references to streamkit/parakeet-models instead of the external csukuangfj repo. Original export attribution preserved in README. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

staging-devin-ai-integration

Devin Review found 1 new potential issue.

View 11 additional findings in Devin Review.

Debug

Playground

staging-devin-ai-integration · 2026-04-10T16:26:29Z

plugins/native/parakeet/src/parakeet_node.rs

+        };
+
+        // Calculate silence threshold in frames (each frame is 32ms)
+        let silence_threshold_frames = (config.min_silence_duration_ms / 32) as usize;


🟡 Integer division truncates silence threshold, making it shorter than configured

The silence threshold in frames is computed via config.min_silence_duration_ms / 32, which uses integer (floor) division. For the default min_silence_duration_ms = 700, the result is 700 / 32 = 21 frames × 32ms = 672ms — 28ms shorter than configured. For the minimum allowed value of 100ms, the result is 100 / 32 = 3 frames = 96ms. The truncation error can be up to 31ms, causing the plugin to trigger transcription slightly earlier than the user specified. Ceiling division should be used to ensure the actual silence duration is at least the configured value.

Suggested change

let silence_threshold_frames = (config.min_silence_duration_ms / 32) as usize;

let silence_threshold_frames = ((config.min_silence_duration_ms + 31) / 32) as usize;

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

- Point silero-vad repo_id to streamkit/parakeet-models instead of streamkit/sensevoice-models to avoid cross-plugin dependency - Remove unused cc build-dependency - Remove unused once_cell dependency (code uses std::sync::LazyLock) - Fix misleading update_params comment that claimed VAD params could be updated at runtime - Remove const from set_threshold (f32::clamp is not const-stable) Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

staging-devin-ai-integration

Devin Review found 0 new potential issues.

View 11 additional findings in Devin Review.

Debug

Playground

staging-devin-ai-integration

Devin Review found 0 new potential issues.

View 12 additional findings in Devin Review.

Debug

Playground

staging-devin-ai-integration bot assigned streamer45 Apr 10, 2026

staging-devin-ai-integration bot requested a review from streamer45 April 10, 2026 13:42

staging-devin-ai-integration bot commented Apr 10, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

staging-devin-ai-integration bot commented Apr 10, 2026

View reviewed changes

streamkit-devin and others added 2 commits April 10, 2026 14:36

style(engine): format oneshot backward walk

f29bc2b

Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

This comment was marked as resolved.

Sign in to view

docs(plugins): add README for parakeet plugin

5f73e6a

Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

staging-devin-ai-integration bot commented Apr 10, 2026

View reviewed changes

Merge branch 'main' into devin/1775827547-parakeet-plugin

56223f9

streamer45 enabled auto-merge (squash) April 10, 2026 19:03

staging-devin-ai-integration bot commented Apr 10, 2026

View reviewed changes

streamer45 merged commit 304c0b9 into main Apr 10, 2026
17 checks passed

streamer45 deleted the devin/1775827547-parakeet-plugin branch April 10, 2026 19:22

	let silence_threshold_frames = (config.min_silence_duration_ms / 32) as usize;
	let silence_threshold_frames = ((config.min_silence_duration_ms + 31) / 32) as usize;

Conversation

staging-devin-ai-integration bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Engine bugfix (separate concern, same PR)

End-to-end tested

Review & Testing Checklist for Human

Notes

Uh oh!

staging-devin-ai-integration bot commented Apr 10, 2026

🤖 Devin AI Engineer

Uh oh!

staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

staging-devin-ai-integration bot Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

staging-devin-ai-integration bot commented Apr 10, 2026 •

edited

Loading

staging-devin-ai-integration bot Apr 10, 2026 •

edited

Loading

staging-devin-ai-integration bot Apr 10, 2026 •

edited

Loading