Multi-instance ASR gateway for eaRS clients. Runs separate FastAPI WebSocket servers per use-case so you can start/stop them to free VRAM on demand.
Gateway endpoints (proxy mode):
- Journal:
:8770(routes toEARS_SERVER_URL_JOURNAL) - Technical:
:8771(routes toEARS_SERVER_URL_TECHNICAL) - Accuracy:
:8772(optional whisper.cpp or OpenAI backend)
Connect from eaRS:
ears-dictation --server ws://<tailscale-ip>:8770/wsears-dictation --server ws://<tailscale-ip>:8771/wsears-dictation --server ws://<tailscale-ip>:8772/ws
This repository is a monorepo combining:
- ASR Gateway (root) — Python FastAPI WebSocket proxy for dictation profiles
- eaRS (
ears/) — Rust speech-to-text engine (git subtree from eaRS)
Gateway documentation is in this README and docs/. For eaRS documentation, see ears/CLAUDE.md.
See git-subtree-ears.md for pull/push commands.
+-------------------+ +---------------------+
Mic PCM ->| eaRS client |--WebSocket->| asr-gateway |
| (ears-dictation) | | FastAPI /ws |
+-------------------+ | - journal (8770) |
| - technical (8771) |
| - accuracy (8772) |
+----------+----------+
|
+------------------------+------------------------+
| | |
v v v
eaRS server (Parakeet) eaRS server (Parakeet) whisper.cpp (Vulkan)
| | |
+----------- word/final events back to client -----+
uv init
uv add "fastapi[standard]" "uvicorn[standard]" onnxruntimeCopy .env.example to .env and fill in your paths/URLs. The Justfile will load it automatically.
The gateway proxies to upstream eaRS servers defined by:
EARS_SERVER_URL_JOURNALEARS_SERVER_URL_TECHNICAL
Accuracy backend selection:
- Local whisper.cpp: set
ASR_BACKEND=whisper_vulkan - OpenAI cloud: set
ASR_BACKEND=openai_transcribeandOPENAI_API_KEY
EARS_SERVER_URL_JOURNAL: upstream eaRS WebSocket URL for journal modeEARS_SERVER_URL_TECHNICAL: upstream eaRS WebSocket URL for technical modeEARS_ACCURACY_URL: whisper accuracy WebSocket URL (batch on stop)EARS_ACCURACY_PROFILE: profile that triggers accuracy pass (journal|technical)EARS_ACCURACY_ENABLED: enable/disable accuracy at runtime (true|false)EARS_PREVIEW_AUTOCOMMIT_SECS: preview auto-commit timeout (0 disables)ACCURACY_BACKEND:whisper_vulkanoropenai_transcribefor the accuracy serverACCURACY_MODEL_JOURNAL: OpenAI model for journal accuracy pass (defaultgpt-4o-transcribe)ACCURACY_MODEL_TECHNICAL: OpenAI model for technical accuracy pass (defaultgpt-4o-transcribe)WHISPER_BIN: path towhisper-cli(defaultwhisper-cli)WHISPER_MODEL_PATH: path to ggml model for whisper.cppWHISPER_SAMPLE_RATE: output sample rate for whisper.cpp (default16000)OPENAI_API_KEY: API key for OpenAI accuracy backendOPENAI_BASE_URL: base URL for OpenAI API (defaulthttps://api.openai.com/v1)OPENAI_TRANSCRIBE_MODEL: accuracy model (defaultgpt-4o-transcribe)OPENAI_TRANSCRIBE_LANGUAGE: optional ISO language codeOPENAI_TRANSCRIBE_PROMPT: optional prompt prefix for transcriptionOPENAI_TRANSCRIBE_TEMPERATURE: optional temperature (default0)OPENAI_TRANSCRIBE_TIMEOUT: request timeout seconds (default30)
Dictation control (single CLI):
just run-journal
just run-technical
just run-toggle
just run-shutdown- Use
scripts/toggle-dictation-profile.shwith--previewto show the overlay buffer and paste on toggle-off. - Accuracy can be forced per run with
--accuracy-on/--accuracy-offor toggled persistently with--accuracy-toggle. - If you bind
SIGRTMIN+1(e.g. KP_Multiply), it toggles accuracy while the preview overlay is visible.
Start/stop to free VRAM:
just journal-start
just journal-stop
just technical-start
just technical-stop
just accuracy-start
just accuracy-stop
just stop-allMulti-agent work package orchestration via just swarm. Each spec-kitty feature gets a dedicated branch, isolated worktrees per WP, and automated wave scheduling.
# 1. Spec, plan, and generate tasks (on master)
spec-kitty specify # define user scenarios
spec-kitty plan # design architecture
spec-kitty tasks # break into work packages
# 2. Start the swarm (auto-creates feature branch from master)
just swarm 005 # creates branch 005-my-feature, sets up WP worktrees
# 3. Monitor and advance
just swarm 005 --status # show board and wave plan
just swarm 005 --review WP01 # submit WP for review
just swarm 005 --done WP01 # mark WP done after review passes
just swarm 005 --reject WP02 --reason "Fix validation" # send back with feedback
# 4. Acceptance
just swarm 005 --accept # validate all WPs done, run readiness check
just swarm 005 --accept --accept-mode pr # force PR mode
# 5. Merge
git push origin 005-my-feature # push feature branch
# open PR against master, then merge- Feature branch created from master on first
just swarmrun meta.jsontarget_branchupdated to the feature branch- WP worktrees created under
.worktrees/<feature>-<WP>/ - Lane-tracking commits accumulate on the feature branch, not master
- If you run
--accepton master, it warns and falls back to local mode
just swarm <prefix>— start next wave (creates feature branch if needed)just swarm <prefix> --status— show kanban board and computed wavesjust swarm <prefix> --review <WP>— submit WP for code reviewjust swarm <prefix> --done <WP>— mark WP as donejust swarm <prefix> --reject <WP> --reason "..."— reject with feedbackjust swarm <prefix> --accept— run acceptance validationjust swarm <prefix> --dry-run— preview without mutationsjust swarm <prefix> --cleanup— clean orphan contexts before starting
Feature specs live in kitty-specs/<NNN>-<slug>/. See .kittify/memory/feature-branch-workflow.md for full workflow details.
just sample-audio
scripts/smoke_test_ws.py \
--url ws://127.0.0.1:8770/ws \
--wav /tmp/jfk_24k.wav \
--chunk-ms 200 \
--sleepProxy smoke test (gateway -> eaRS):
EARS_SERVER_URL_JOURNAL=ws://127.0.0.1:8765 just journal-start
scripts/smoke_test_ws.py --url ws://127.0.0.1:8770/ws --wav /path/to/audio.wav --chunk-ms 200 --sleepInstall whisper-cli from whisper.cpp and set WHISPER_BIN to the binary path. Build with Vulkan support:
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp
cmake -B build -DGGML_VULKAN=1
cmake --build build -j --config ReleaseDownload a model and run whisper-cli:
sh ./models/download-ggml-model.sh large-v3
./build/bin/whisper-cli -m models/ggml-large-v3.bin -f /path/to/audio.wavNote: whisper-cli expects 16-bit WAV input. You may need to resample to 16 kHz for best results. The whisper.cpp repo is not part of this monorepo — build it separately and reference the binary via WHISPER_BIN.
The server expects raw 24kHz 16-bit mono PCM bytes over WebSocket. Send a JSON message to end the stream:
{"type":"stop"}- Parakeet inference lives in eaRS; this gateway is used as a profile router.
- Whisper accuracy server uses
whisper.cppvia a subprocess and processes audio on stop (batch). - OpenAI accuracy backend uses
gpt-4o-transcribe(orwhisper-1) via API and processes audio on stop (batch). - The gateway resamples to
WHISPER_SAMPLE_RATEfor whisper.cpp.
- Default usage: journal for general dictation; technical for speaking to AI agents in CLI.
- Journal default correction model:
qwen2.5:7b(more literal output). - Technical default correction model:
qwen2.5-coder:14b(code-aware cleanup). - Preview + accuracy shows a review menu (RAW/FINAL/ACCURACY/CANCEL) with ↑/↓ and →/Enter to paste.