-
Notifications
You must be signed in to change notification settings - Fork 0
feat(plugins): add Parakeet TDT speech-to-text plugin #281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9d86a49
9f03df7
d755537
4735eb0
f29bc2b
5f73e6a
25f28c0
8151a2e
7975979
56223f9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,144 @@ | ||
| --- | ||
| # SPDX-FileCopyrightText: © 2025 StreamKit Contributors | ||
| # SPDX-License-Identifier: MPL-2.0 | ||
| title: "plugin::native::parakeet" | ||
| description: "Fast speech-to-text transcription using NVIDIA Parakeet TDT, a transducer-based ASR model. Approximately 10x faster than Whisper on consumer hardware with competitive accuracy. Uses sherpa-onnx for inference. Requires 16kHz mono audio input." | ||
| --- | ||
|
|
||
| `kind`: `plugin::native::parakeet` (original kind: `parakeet`) | ||
|
|
||
| Fast speech-to-text transcription using NVIDIA Parakeet TDT, a transducer-based ASR model. Approximately 10x faster than Whisper on consumer hardware with competitive accuracy. Uses sherpa-onnx for inference. Requires 16kHz mono audio input. | ||
|
|
||
| Source: `target/plugins/release/libparakeet.so` | ||
|
|
||
| ## Categories | ||
| - `ml` | ||
| - `speech` | ||
| - `transcription` | ||
|
|
||
| ## Pins | ||
| ### Inputs | ||
| - `in` accepts `RawAudio(AudioFormat { sample_rate: 16000, channels: 1, sample_format: F32 })` (one) | ||
|
|
||
| ### Outputs | ||
| - `out` produces `Transcription` (broadcast) | ||
|
|
||
| ## Parameters | ||
| | Name | Type | Required | Default | Description | | ||
| | --- | --- | --- | --- | --- | | ||
| | `execution_provider` | `string enum[cpu, cuda, tensorrt]` | no | `cpu` | Execution provider (cpu, cuda, tensorrt) | | ||
| | `max_segment_duration_secs` | `number` | no | `30.0` | Maximum segment duration before forced transcription (seconds)<br />min: `5`<br />max: `120` | | ||
| | `min_silence_duration_ms` | `integer` | no | `700` | Minimum silence duration before transcription (milliseconds)<br />min: `100`<br />max: `5000` | | ||
| | `model_dir` | `string` | no | `models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8` | Path to Parakeet TDT model directory (contains encoder.int8.onnx, decoder.int8.onnx, joiner.int8.onnx, tokens.txt). IMPORTANT: Input audio must be 16kHz mono f32. | | ||
| | `num_threads` | `integer` | no | `4` | Number of threads for inference<br />min: `1`<br />max: `16` | | ||
| | `use_vad` | `boolean` | no | `true` | Enable VAD-based segmentation | | ||
| | `vad_model_path` | `string` | no | `models/silero_vad.onnx` | Path to Silero VAD ONNX model file | | ||
| | `vad_threshold` | `number` | no | `0.5` | VAD speech probability threshold (0.0-1.0)<br />min: `0`<br />max: `1` | | ||
|
|
||
| ## Example Pipeline | ||
|
|
||
| ```yaml | ||
| # | ||
| # skit:input_asset_tags=speech | ||
|
|
||
| name: Speech-to-Text (Parakeet TDT) | ||
| description: Fast English speech transcription using NVIDIA Parakeet TDT (~10x faster than Whisper on CPU) | ||
| mode: oneshot | ||
| steps: | ||
| - kind: streamkit::http_input | ||
|
|
||
| - kind: containers::ogg::demuxer | ||
|
|
||
| - kind: audio::opus::decoder | ||
|
|
||
| - kind: audio::resampler | ||
| params: | ||
| chunk_frames: 960 | ||
| output_frame_size: 960 | ||
| target_sample_rate: 16000 | ||
|
|
||
| - kind: plugin::native::parakeet | ||
| params: | ||
| model_dir: models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8 | ||
| num_threads: 4 | ||
| use_vad: true | ||
| vad_model_path: models/silero_vad.onnx | ||
| vad_threshold: 0.5 | ||
| min_silence_duration_ms: 700 | ||
|
|
||
| - kind: core::json_serialize | ||
| params: | ||
| pretty: false | ||
| newline_delimited: true | ||
|
|
||
| - kind: streamkit::http_output | ||
| params: | ||
| content_type: application/json | ||
| ``` | ||
|
|
||
|
|
||
| <details> | ||
| <summary>Raw JSON Schema</summary> | ||
|
|
||
| ```json | ||
| { | ||
| "properties": { | ||
| "execution_provider": { | ||
| "default": "cpu", | ||
| "description": "Execution provider (cpu, cuda, tensorrt)", | ||
| "enum": [ | ||
| "cpu", | ||
| "cuda", | ||
| "tensorrt" | ||
| ], | ||
| "type": "string" | ||
| }, | ||
| "max_segment_duration_secs": { | ||
| "default": 30.0, | ||
| "description": "Maximum segment duration before forced transcription (seconds)", | ||
| "maximum": 120.0, | ||
| "minimum": 5.0, | ||
| "type": "number" | ||
| }, | ||
| "min_silence_duration_ms": { | ||
| "default": 700, | ||
| "description": "Minimum silence duration before transcription (milliseconds)", | ||
| "maximum": 5000, | ||
| "minimum": 100, | ||
| "type": "integer" | ||
| }, | ||
| "model_dir": { | ||
| "default": "models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8", | ||
| "description": "Path to Parakeet TDT model directory (contains encoder.int8.onnx, decoder.int8.onnx, joiner.int8.onnx, tokens.txt). IMPORTANT: Input audio must be 16kHz mono f32.", | ||
| "type": "string" | ||
| }, | ||
| "num_threads": { | ||
| "default": 4, | ||
| "description": "Number of threads for inference", | ||
| "maximum": 16, | ||
| "minimum": 1, | ||
| "type": "integer" | ||
| }, | ||
| "use_vad": { | ||
| "default": true, | ||
| "description": "Enable VAD-based segmentation", | ||
| "type": "boolean" | ||
| }, | ||
| "vad_model_path": { | ||
| "default": "models/silero_vad.onnx", | ||
| "description": "Path to Silero VAD ONNX model file", | ||
| "type": "string" | ||
| }, | ||
| "vad_threshold": { | ||
| "default": 0.5, | ||
| "description": "VAD speech probability threshold (0.0-1.0)", | ||
| "maximum": 1.0, | ||
| "minimum": 0.0, | ||
| "type": "number" | ||
| } | ||
| }, | ||
| "type": "object" | ||
| } | ||
| ``` | ||
|
|
||
| </details> |
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚩 download-models target doesn't include download-parakeet-models The Was this helpful? React with 👍 or 👎 to provide feedback. Debug
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. At ~660MB the Parakeet model is substantially larger than most other models, so I've kept it out of the default |
Uh oh!
There was an error while loading. Please reload this page.