Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions crates/engine/src/oneshot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -384,6 +384,11 @@ impl Engine {
for _ in 0..max_steps {
steps += 1;
if let Some(def) = definition.nodes.get(cursor) {
// Skip synthetic oneshot nodes — they are not in the
// registry and are handled separately by the engine.
if def.kind == "streamkit::http_input" || def.kind == "streamkit::http_output" {
break;
}
let temp = registry.create_node(&def.kind, def.params.as_ref())?;
if let Some(ct) = temp.content_type() {
found = Some(ct);
Expand Down
3 changes: 2 additions & 1 deletion docs/src/content/docs/reference/plugins/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,13 @@ curl http://localhost:4545/api/v1/plugins
curl http://localhost:4545/api/v1/schema/nodes | jq '.[] | select(.kind | startswith("plugin::"))'
```

## Official plugins (10)
## Official plugins (11)

- [`plugin::native::helsinki`](./plugin-native-helsinki/) (original kind: `helsinki`)
- [`plugin::native::kokoro`](./plugin-native-kokoro/) (original kind: `kokoro`)
- [`plugin::native::matcha`](./plugin-native-matcha/) (original kind: `matcha`)
- [`plugin::native::nllb`](./plugin-native-nllb/) (original kind: `nllb`)
- [`plugin::native::parakeet`](./plugin-native-parakeet/) (original kind: `parakeet`)
- [`plugin::native::piper`](./plugin-native-piper/) (original kind: `piper`)
- [`plugin::native::pocket-tts`](./plugin-native-pocket-tts/) (original kind: `pocket-tts`)
- [`plugin::native::sensevoice`](./plugin-native-sensevoice/) (original kind: `sensevoice`)
Expand Down
144 changes: 144 additions & 0 deletions docs/src/content/docs/reference/plugins/plugin-native-parakeet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
---
# SPDX-FileCopyrightText: © 2025 StreamKit Contributors
# SPDX-License-Identifier: MPL-2.0
title: "plugin::native::parakeet"
description: "Fast speech-to-text transcription using NVIDIA Parakeet TDT, a transducer-based ASR model. Approximately 10x faster than Whisper on consumer hardware with competitive accuracy. Uses sherpa-onnx for inference. Requires 16kHz mono audio input."
---

`kind`: `plugin::native::parakeet` (original kind: `parakeet`)

Fast speech-to-text transcription using NVIDIA Parakeet TDT, a transducer-based ASR model. Approximately 10x faster than Whisper on consumer hardware with competitive accuracy. Uses sherpa-onnx for inference. Requires 16kHz mono audio input.

Source: `target/plugins/release/libparakeet.so`

## Categories
- `ml`
- `speech`
- `transcription`

## Pins
### Inputs
- `in` accepts `RawAudio(AudioFormat { sample_rate: 16000, channels: 1, sample_format: F32 })` (one)

### Outputs
- `out` produces `Transcription` (broadcast)

## Parameters
| Name | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `execution_provider` | `string enum[cpu, cuda, tensorrt]` | no | `cpu` | Execution provider (cpu, cuda, tensorrt) |
| `max_segment_duration_secs` | `number` | no | `30.0` | Maximum segment duration before forced transcription (seconds)<br />min: `5`<br />max: `120` |
| `min_silence_duration_ms` | `integer` | no | `700` | Minimum silence duration before transcription (milliseconds)<br />min: `100`<br />max: `5000` |
| `model_dir` | `string` | no | `models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8` | Path to Parakeet TDT model directory (contains encoder.int8.onnx, decoder.int8.onnx, joiner.int8.onnx, tokens.txt). IMPORTANT: Input audio must be 16kHz mono f32. |
| `num_threads` | `integer` | no | `4` | Number of threads for inference<br />min: `1`<br />max: `16` |
| `use_vad` | `boolean` | no | `true` | Enable VAD-based segmentation |
| `vad_model_path` | `string` | no | `models/silero_vad.onnx` | Path to Silero VAD ONNX model file |
| `vad_threshold` | `number` | no | `0.5` | VAD speech probability threshold (0.0-1.0)<br />min: `0`<br />max: `1` |

## Example Pipeline

```yaml
#
# skit:input_asset_tags=speech

name: Speech-to-Text (Parakeet TDT)
description: Fast English speech transcription using NVIDIA Parakeet TDT (~10x faster than Whisper on CPU)
mode: oneshot
steps:
- kind: streamkit::http_input

- kind: containers::ogg::demuxer

- kind: audio::opus::decoder

- kind: audio::resampler
params:
chunk_frames: 960
output_frame_size: 960
target_sample_rate: 16000

- kind: plugin::native::parakeet
params:
model_dir: models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8
num_threads: 4
use_vad: true
vad_model_path: models/silero_vad.onnx
vad_threshold: 0.5
min_silence_duration_ms: 700

- kind: core::json_serialize
params:
pretty: false
newline_delimited: true

- kind: streamkit::http_output
params:
content_type: application/json
```


<details>
<summary>Raw JSON Schema</summary>

```json
{
"properties": {
"execution_provider": {
"default": "cpu",
"description": "Execution provider (cpu, cuda, tensorrt)",
"enum": [
"cpu",
"cuda",
"tensorrt"
],
"type": "string"
},
"max_segment_duration_secs": {
"default": 30.0,
"description": "Maximum segment duration before forced transcription (seconds)",
"maximum": 120.0,
"minimum": 5.0,
"type": "number"
},
"min_silence_duration_ms": {
"default": 700,
"description": "Minimum silence duration before transcription (milliseconds)",
"maximum": 5000,
"minimum": 100,
"type": "integer"
},
"model_dir": {
"default": "models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8",
"description": "Path to Parakeet TDT model directory (contains encoder.int8.onnx, decoder.int8.onnx, joiner.int8.onnx, tokens.txt). IMPORTANT: Input audio must be 16kHz mono f32.",
"type": "string"
},
"num_threads": {
"default": 4,
"description": "Number of threads for inference",
"maximum": 16,
"minimum": 1,
"type": "integer"
},
"use_vad": {
"default": true,
"description": "Enable VAD-based segmentation",
"type": "boolean"
},
"vad_model_path": {
"default": "models/silero_vad.onnx",
"description": "Path to Silero VAD ONNX model file",
"type": "string"
},
"vad_threshold": {
"default": 0.5,
"description": "VAD speech probability threshold (0.0-1.0)",
"maximum": 1.0,
"minimum": 0.0,
"type": "number"
}
},
"type": "object"
}
```

</details>
40 changes: 38 additions & 2 deletions justfile
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 download-models target doesn't include download-parakeet-models

The download-models target at justfile line 822 doesn't include download-parakeet-models. Looking at the pattern, NLLB and pocket-tts are also excluded (NLLB for licensing, pocket-tts for gating). Parakeet models are CC-BY-4.0 (not restrictive), so unlike NLLB they could be included. However, at ~660MB they are significantly larger than most other models. This may be an intentional omission to keep default download size manageable, or it may be an oversight. Worth confirming with the author.

Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. At ~660MB the Parakeet model is substantially larger than most other models, so I've kept it out of the default download-models target (similar to pocket-tts) and added it as an optional suggestion in the output message. Fixed in 9f03df7.

Original file line number Diff line number Diff line change
Expand Up @@ -723,6 +723,39 @@ upload-sensevoice-plugin: build-plugin-native-sensevoice
@curl -X POST -F "plugin=@{{plugins_target_dir}}/release/libsensevoice.so" \
http://127.0.0.1:4545/api/v1/plugins

# Build native Parakeet TDT STT plugin
[working-directory: 'plugins/native/parakeet']
build-plugin-native-parakeet:
@echo "Building native Parakeet TDT STT plugin..."
@CARGO_TARGET_DIR={{plugins_target_dir}} cargo build --release

# Upload Parakeet plugin to running server
[working-directory: 'plugins/native/parakeet']
upload-parakeet-plugin: build-plugin-native-parakeet
@echo "Uploading Parakeet plugin to server..."
@curl -X POST -F "plugin=@{{plugins_target_dir}}/release/libparakeet.so" \
http://127.0.0.1:4545/api/v1/plugins

# Download Parakeet TDT models
download-parakeet-models:
@echo "Downloading Parakeet TDT models (~631MB)..."
@mkdir -p models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8
@HF_BASE="https://huggingface.co/streamkit/parakeet-models/resolve/main" && \
MODEL_DIR="models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8" && \
for f in encoder.int8.onnx decoder.int8.onnx joiner.int8.onnx tokens.txt; do \
if [ -f "$MODEL_DIR/$f" ]; then \
echo "✓ $f already exists"; \
else \
echo "Downloading $f..." && \
curl -L -o "$MODEL_DIR/$f" "$HF_BASE/$f" || exit 1; \
fi; \
done && \
echo "✓ Parakeet TDT models ready at $MODEL_DIR (English)"

# Setup Parakeet (install dependencies + download models)
setup-parakeet: install-sherpa-onnx download-parakeet-models download-silero-vad
@echo "✓ Parakeet TDT STT setup complete!"

# Download pre-converted NLLB models from Hugging Face
download-nllb-models:
@echo "Downloading pre-converted NLLB-200 models from Hugging Face..."
Expand Down Expand Up @@ -792,6 +825,9 @@ download-models: download-whisper-models download-silero-vad download-kokoro-mod
@echo "Optional: To download Pocket TTS models (gated; requires HF_TOKEN):"
@echo " just download-pocket-tts-models"
@echo ""
@echo "Optional: To download Parakeet TDT models (~660MB, CC-BY-4.0):"
@echo " just download-parakeet-models"
@echo ""
@du -sh models/

# Setup VAD (install dependencies + download models)
Expand Down Expand Up @@ -979,7 +1015,7 @@ install-plugin name: (build-plugin-native name)
fi

# Build all native plugin examples
build-plugins-native: build-plugin-native-gain build-plugin-native-whisper build-plugin-native-kokoro build-plugin-native-piper build-plugin-native-matcha build-plugin-native-pocket-tts build-plugin-native-sensevoice build-plugin-native-nllb build-plugin-native-vad build-plugin-native-helsinki build-plugin-native-supertonic build-plugin-native-slint build-plugin-native-aac-encoder
build-plugins-native: build-plugin-native-gain build-plugin-native-whisper build-plugin-native-kokoro build-plugin-native-piper build-plugin-native-matcha build-plugin-native-pocket-tts build-plugin-native-sensevoice build-plugin-native-nllb build-plugin-native-vad build-plugin-native-helsinki build-plugin-native-supertonic build-plugin-native-slint build-plugin-native-aac-encoder build-plugin-native-parakeet

## Combined

Expand Down Expand Up @@ -1042,7 +1078,7 @@ copy-plugins-native:

# Official native plugins (shared target dir).
# For most plugins the lib stem matches the plugin id.
for name in whisper kokoro piper matcha vad sensevoice nllb helsinki supertonic slint; do
for name in whisper kokoro piper matcha vad sensevoice nllb helsinki supertonic slint parakeet; do
copy_plugin "$name" "$name" "$PLUGINS_TARGET"
done

Expand Down
53 changes: 53 additions & 0 deletions marketplace/official-plugins.json
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,59 @@
}
]
},
{
"id": "parakeet",
"name": "Parakeet TDT",
"version": "0.1.0",
"node_kind": "parakeet",
"kind": "native",
"entrypoint": "libparakeet.so",
"artifact": "target/plugins/release/libparakeet.so",
"description": "Fast speech-to-text using NVIDIA Parakeet TDT via sherpa-onnx",
"license": "MPL-2.0",
"homepage": "https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2",
"models": [
{
"id": "parakeet-tdt-0.6b-v2-int8",
"name": "Parakeet TDT 0.6B v2 (English, INT8)",
"default": true,
"source": "huggingface",
"repo_id": "streamkit/parakeet-models",
"revision": "main",
"files": [
"encoder.int8.onnx",
"decoder.int8.onnx",
"joiner.int8.onnx",
"tokens.txt"
],
"expected_size_bytes": 661190513,
"license": "CC-BY-4.0",
"license_url": "https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2",
"file_checksums": {
"encoder.int8.onnx": "a32b12d17bbbc309d0686fbbcc2987b5e9b8333a7da83fa6b089f0a2acd651ab",
"decoder.int8.onnx": "b6bb64963457237b900e496ee9994b59294526439fbcc1fecf705b31a15c6b4e",
"joiner.int8.onnx": "7946164367946e7f9f29a122407c3252b680dbae9a51343eb2488d057c3c43d2",
"tokens.txt": "ec182b70dd42113aff6c5372c75cac58c952443eb22322f57bbd7f53977d497d"
}
},
{
"id": "silero-vad",
"name": "Silero VAD (v6.2)",
"default": true,
"source": "huggingface",
"repo_id": "streamkit/parakeet-models",
"revision": "main",
"files": [
"silero_vad.onnx"
],
"expected_size_bytes": 2327524,
"license": "MIT",
"license_url": "https://github.com/snakers4/silero-vad/blob/master/LICENSE",
"sha256": "1a153a22f4509e292a94e67d6f9b85e8deb25b4988682b7e174c65279d8788e3"
}
],
"repo": "https://github.com/streamer45/streamkit"
},
{
"id": "piper",
"name": "Piper",
Expand Down
Loading
Loading