asr-gateway

Multi-instance ASR gateway for eaRS clients. Runs separate FastAPI WebSocket servers per use-case so you can start/stop them to free VRAM on demand.

Profiles and ports

Gateway endpoints (proxy mode):

Journal: :8770 (routes to EARS_SERVER_URL_JOURNAL)
Technical: :8771 (routes to EARS_SERVER_URL_TECHNICAL)
Accuracy: :8772 (optional whisper.cpp or OpenAI backend)

Connect from eaRS:

ears-dictation --server ws://<tailscale-ip>:8770/ws
ears-dictation --server ws://<tailscale-ip>:8771/ws
ears-dictation --server ws://<tailscale-ip>:8772/ws

Project Structure

This repository is a monorepo combining:

ASR Gateway (root) — Python FastAPI WebSocket proxy for dictation profiles
eaRS (ears/) — Rust speech-to-text engine (git subtree from eaRS)

Gateway documentation is in this README and docs/. For eaRS documentation, see ears/CLAUDE.md.

ears subtree operations

See git-subtree-ears.md for pull/push commands.

System overview

          +-------------------+             +---------------------+
Mic PCM ->| eaRS client       |--WebSocket->| asr-gateway          |
          | (ears-dictation)  |             | FastAPI /ws          |
          +-------------------+             |  - journal (8770)    |
                                            |  - technical (8771)  |
                                            |  - accuracy (8772)   |
                                            +----------+----------+
                                                       |
                              +------------------------+------------------------+
                              |                         |                        |
                              v                         v                        v
                       eaRS server (Parakeet)   eaRS server (Parakeet)   whisper.cpp (Vulkan)
                              |                         |                        |
                              +----------- word/final events back to client -----+

Quick start (uv)

uv init
uv add "fastapi[standard]" "uvicorn[standard]" onnxruntime

Environment setup

Copy .env.example to .env and fill in your paths/URLs. The Justfile will load it automatically.

The gateway proxies to upstream eaRS servers defined by:

EARS_SERVER_URL_JOURNAL
EARS_SERVER_URL_TECHNICAL

Accuracy backend selection:

Local whisper.cpp: set ASR_BACKEND=whisper_vulkan
OpenAI cloud: set ASR_BACKEND=openai_transcribe and OPENAI_API_KEY

Environment variables

EARS_SERVER_URL_JOURNAL: upstream eaRS WebSocket URL for journal mode
EARS_SERVER_URL_TECHNICAL: upstream eaRS WebSocket URL for technical mode
EARS_ACCURACY_URL: whisper accuracy WebSocket URL (batch on stop)
EARS_ACCURACY_PROFILE: profile that triggers accuracy pass (journal|technical)
EARS_ACCURACY_ENABLED: enable/disable accuracy at runtime (true|false)
EARS_PREVIEW_AUTOCOMMIT_SECS: preview auto-commit timeout (0 disables)
ACCURACY_BACKEND: whisper_vulkan or openai_transcribe for the accuracy server
ACCURACY_MODEL_JOURNAL: OpenAI model for journal accuracy pass (default gpt-4o-transcribe)
ACCURACY_MODEL_TECHNICAL: OpenAI model for technical accuracy pass (default gpt-4o-transcribe)
WHISPER_BIN: path to whisper-cli (default whisper-cli)
WHISPER_MODEL_PATH: path to ggml model for whisper.cpp
WHISPER_SAMPLE_RATE: output sample rate for whisper.cpp (default 16000)
OPENAI_API_KEY: API key for OpenAI accuracy backend
OPENAI_BASE_URL: base URL for OpenAI API (default https://api.openai.com/v1)
OPENAI_TRANSCRIBE_MODEL: accuracy model (default gpt-4o-transcribe)
OPENAI_TRANSCRIBE_LANGUAGE: optional ISO language code
OPENAI_TRANSCRIBE_PROMPT: optional prompt prefix for transcription
OPENAI_TRANSCRIBE_TEMPERATURE: optional temperature (default 0)
OPENAI_TRANSCRIBE_TIMEOUT: request timeout seconds (default 30)

Justfile commands

Dictation control (single CLI):

just run-journal
just run-technical
just run-toggle
just run-shutdown

Dictation controls

Use scripts/toggle-dictation-profile.sh with --preview to show the overlay buffer and paste on toggle-off.
Accuracy can be forced per run with --accuracy-on/--accuracy-off or toggled persistently with --accuracy-toggle.
If you bind SIGRTMIN+1 (e.g. KP_Multiply), it toggles accuracy while the preview overlay is visible.

Start/stop to free VRAM:

just journal-start
just journal-stop
just technical-start
just technical-stop
just accuracy-start
just accuracy-stop
just stop-all

Swarm orchestration (spec-kitty)

Multi-agent work package orchestration via just swarm. Each spec-kitty feature gets a dedicated branch, isolated worktrees per WP, and automated wave scheduling.

Feature lifecycle

# 1. Spec, plan, and generate tasks (on master)
spec-kitty specify    # define user scenarios
spec-kitty plan       # design architecture
spec-kitty tasks      # break into work packages

# 2. Start the swarm (auto-creates feature branch from master)
just swarm 005        # creates branch 005-my-feature, sets up WP worktrees

# 3. Monitor and advance
just swarm 005 --status              # show board and wave plan
just swarm 005 --review WP01         # submit WP for review
just swarm 005 --done WP01           # mark WP done after review passes
just swarm 005 --reject WP02 --reason "Fix validation"  # send back with feedback

# 4. Acceptance
just swarm 005 --accept              # validate all WPs done, run readiness check
just swarm 005 --accept --accept-mode pr  # force PR mode

# 5. Merge
git push origin 005-my-feature       # push feature branch
# open PR against master, then merge

What happens automatically

Feature branch created from master on first just swarm run
meta.json target_branch updated to the feature branch
WP worktrees created under .worktrees/<feature>-<WP>/
Lane-tracking commits accumulate on the feature branch, not master
If you run --accept on master, it warns and falls back to local mode

Swarm commands

just swarm <prefix> — start next wave (creates feature branch if needed)
just swarm <prefix> --status — show kanban board and computed waves
just swarm <prefix> --review <WP> — submit WP for code review
just swarm <prefix> --done <WP> — mark WP as done
just swarm <prefix> --reject <WP> --reason "..." — reject with feedback
just swarm <prefix> --accept — run acceptance validation
just swarm <prefix> --dry-run — preview without mutations
just swarm <prefix> --cleanup — clean orphan contexts before starting

Feature specs live in kitty-specs/<NNN>-<slug>/. See .kittify/memory/feature-branch-workflow.md for full workflow details.

Smoke test

just sample-audio

scripts/smoke_test_ws.py \
  --url ws://127.0.0.1:8770/ws \
  --wav /tmp/jfk_24k.wav \
  --chunk-ms 200 \
  --sleep

Proxy smoke test (gateway -> eaRS):

EARS_SERVER_URL_JOURNAL=ws://127.0.0.1:8765 just journal-start
scripts/smoke_test_ws.py --url ws://127.0.0.1:8770/ws --wav /path/to/audio.wav --chunk-ms 200 --sleep

Whisper Vulkan setup (AMD GPU)

Install whisper-cli from whisper.cpp and set WHISPER_BIN to the binary path. Build with Vulkan support:

git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp
cmake -B build -DGGML_VULKAN=1
cmake --build build -j --config Release

Download a model and run whisper-cli:

sh ./models/download-ggml-model.sh large-v3
./build/bin/whisper-cli -m models/ggml-large-v3.bin -f /path/to/audio.wav

Note: whisper-cli expects 16-bit WAV input. You may need to resample to 16 kHz for best results. The whisper.cpp repo is not part of this monorepo — build it separately and reference the binary via WHISPER_BIN.

WebSocket protocol

The server expects raw 24kHz 16-bit mono PCM bytes over WebSocket. Send a JSON message to end the stream:

{"type":"stop"}

Notes

Parakeet inference lives in eaRS; this gateway is used as a profile router.
Whisper accuracy server uses whisper.cpp via a subprocess and processes audio on stop (batch).
OpenAI accuracy backend uses gpt-4o-transcribe (or whisper-1) via API and processes audio on stop (batch).
The gateway resamples to WHISPER_SAMPLE_RATE for whisper.cpp.

Dictation tuning notes

Default usage: journal for general dictation; technical for speaking to AI agents in CLI.
Journal default correction model: qwen2.5:7b (more literal output).
Technical default correction model: qwen2.5-coder:14b (code-aware cleanup).
Preview + accuracy shows a review menu (RAW/FINAL/ACCURACY/CANCEL) with ↑/↓ and →/Enter to paste.

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
.github/workflows		.github/workflows
.kittify		.kittify
.opencode/command		.opencode/command
app		app
docs		docs
ears		ears
kitty-specs		kitty-specs
scripts		scripts
tests/swarm		tests/swarm
.claudeignore		.claudeignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
IMPLEMENT.md		IMPLEMENT.md
Justfile		Justfile
README.md		README.md
REVIEW.md		REVIEW.md
git-subtree-ears.md		git-subtree-ears.md
kasr.code-workspace		kasr.code-workspace
opencode.json		opencode.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

asr-gateway

Profiles and ports

Project Structure

ears subtree operations

System overview

Quick start (uv)

Environment setup

Environment variables

Justfile commands

Dictation controls

Swarm orchestration (spec-kitty)

Feature lifecycle

What happens automatically

Swarm commands

Smoke test

Whisper Vulkan setup (AMD GPU)

WebSocket protocol

Notes

Dictation tuning notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

kastheco/kASR

Folders and files

Latest commit

History

Repository files navigation

asr-gateway

Profiles and ports

Project Structure

ears subtree operations

System overview

Quick start (uv)

Environment setup

Environment variables

Justfile commands

Dictation controls

Swarm orchestration (spec-kitty)

Feature lifecycle

What happens automatically

Swarm commands

Smoke test

Whisper Vulkan setup (AMD GPU)

WebSocket protocol

Notes

Dictation tuning notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages