Voice-driven AI code editing system powered by a local Go core (vocode-cored) and VS Code extension.
Vocode lets you speak code changes, and have them applied intelligently to your project using structured edits instead of raw text replacement.
Vocode is composed of three main parts:
- VS Code Extension (TypeScript)
- Captures voice + user intent
- Displays UI (transcripts, diffs, status)
- Sends requests to
vocode-cored(JSON-RPC)
- Core backend (Go,
apps/core)
- Runs locally
- Handles:
- agent logic
- code edits (AST/diff-based)
- symbol resolution (host-provided LSP document symbols)
- command execution
- transcript agent-loop orchestration
- Voice Sidecar (Go)
- Runs locally as a separate process
- Handles:
- native microphone capture
- STT provider integration
- transcript event emission to the extension
For now, these communicate over stdio (JSON-RPC). Maybe WebSocket in the future
apps/
core/ # Go core engine (vocode-cored)
voice/ # Go voice sidecar (mic + STT)
vscode-extension/ # VS Code extension (UI + client)
packages/
protocol/ # Shared schemas (Go + TS)
prompts/ # LLM prompts
docs/
architecture.md
editing-model.md
indexing.md
protocol.md
scripts/
dev/ # Build + dev scripts
codegen/ # Protocol/code generation (future)
config/
vocode.example.json
-
VS Code workflow: open Vocode → Settings (sidebar) and save your ElevenLabs API key (secret storage). Other knobs live under Settings → Vocode; defaults are defined in
apps/vscode-extension/package.json. There is no committed.env— the extension does not load one for spawned core / voice processes.Contributors / shell parity: see
docs/vscode-settings-env.mdfor how eachvocode.*setting maps to environment variables when spawning processes. Forgo runor manual binaries, export those names yourself and align values withpackage.jsondefaults where needed. -
Install dependencies
corepack enable
pnpm installThe voice sidecar (@vocode/voice) uses cgo + PortAudio, so on Windows you must install native deps before builds.
From the repo root:
pnpm setup-portaudio:winIf your MSYS2 install is not at C:\tools\msys64, pass your MSYS2 root:
pnpm setup-portaudio:win "D:\msys64"On Ubuntu/Debian, install PortAudio dev headers + pkg-config:
sudo apt-get update
sudo apt-get install -y pkg-config portaudio19-dev- Generate protocol types
pnpm codegenThe voice sidecar uses ElevenLabs streaming STT with local VAD gating.
Recommended baseline (VS Code): use extension defaults, or set vocode.elevenLabsSttModelId / other vocode.* keys in Settings.
Tuning guide (env var names when running from a terminal, or for reading core code):
- higher
VOCODE_VOICE_VAD_END_MS-> fewer premature utterance commits - higher
VOCODE_VOICE_VAD_PREROLL_MS-> less start-of-speech clipping - lower
VOCODE_VOICE_STREAM_MIN_CHUNK_MS-> lower latency while speaking - higher
VOCODE_VOICE_STREAM_MAX_CHUNK_MS-> fewer websocket chunk sends
Core transcript queueing:
- the VS Code extension forwards committed transcripts to
vocode-cored - the core processes transcripts in FIFO order and can coalesce multiple committed transcripts that arrive within
VOCODE_DAEMON_VOICE_TRANSCRIPT_COALESCE_MS - queue + merge bounds are configurable via
VOCODE_DAEMON_VOICE_TRANSCRIPT_QUEUE_SIZE,VOCODE_DAEMON_VOICE_TRANSCRIPT_MAX_MERGE_JOBS, andVOCODE_DAEMON_VOICE_TRANSCRIPT_MAX_MERGE_CHARS
voice.transcript runs a narrow-model pipeline in apps/core/internal/transcript/ (pipeline, searchapply, …). It is not an iterative intents[] loop.
Per committed utterance: at most one transcript pipeline pass and one host.applyDirectives batch. On apply failure (e.g. stale_range), the core returns success: false; the user re-speaks.
voice.transcript returns VoiceTranscriptCompletion:
successand optionalsummarytranscriptOutcome(e.g.search,selection,selection_control,file_selection,file_selection_control,clarify,clarify_control,needs_workspace_folder,irrelevant,answer,completed)uiDisposition(shown | skipped | hidden)- optional
searchResults+activeSearchIndex,answerText, etc.
transcriptOutcome + uiDisposition quick guide:
search/selection/selection_control/file_selection/file_selection_control→ usuallyhidden(panel flows)clarify/clarify_control→hiddenirrelevant→skipped(orhiddenduring an active match-list session)answer→hidden(Chat UI)- successful edit completion →
shown
When success is false, treat directives as invalid. Session tuning uses vocode.sessionIdleResetMs and daemonConfig on the RPC (maxGatheredBytes, maxGatheredExcerpts).
- Build the core backend (
vocode-cored)
pnpm --filter @vocode/core buildThis creates:
apps/vscode-extension/bin/<platform-arch>/vocode-cored(.exe)
- Run the extension Press:
F5
This launches a VS Code Extension Development Host.
You should see logs like:
Vocode extension activated
[vocode] using dev binary path (log label may still say daemon): ...
[vocode-cored stderr] vocode-cored starting...
pnpm build
pnpm lint
pnpm lint:fix
go test ./...
VS Code Extension
├── commands/
├── daemon/ (spawn + JSON-RPC client for vocode-cored; historical folder name)
├── directives/ (host apply for protocol directives)
├── voice-transcript/ (voice.transcript RPC + apply batch)
├── ui/
│ └── panel/ (main webview + store)
└── voice/ (sidecar client + spawn)
↓ stdio (JSON-RPC)
Go core (`apps/core`)
├── rpc/
├── agent/
├── flows/ (route classification + per-flow handlers)
├── workspace/
├── search/ (e.g. ripgrep-backed search)
└── transcript/ (service, pipeline, session, searchapply, ...)
Voice Sidecar
├── app/ (stdio protocol loop)
├── mic/ (native capture)
└── stt/ (provider adapters)
We never blindly rewrite files.
All edits are:
- orchestrated in the core
- anchored
- validated
- diffed before apply
The current implementation intentionally supports a small deterministic slice instead of pretending the agent covers all edit styles.
All intelligence lives in vocode-cored.
The extension is just:
- input/output
- UI
- transport
- No cloud dependency required
- Works offline (future: whisper.cpp)
- Fast + private
- voice → streaming STT
- edits → incremental intent iteration (currently rule-based for a small safe slice)
- UI → live feedback
- ✅ Extension boots
- ✅ Core (
vocode-cored) spawns - ✅ Cross-platform core build
- ✅ JSON-RPC over stdio (
vocode-cored+ extension) - 🚧 Rich edit engine wiring beyond the initial safe slice
- 🚧 Voice pipeline
vocode-cored is built per platform into the VS Code extension package (same path the packaged VSIX uses):
apps/vscode-extension/bin/
win32-x64/vocode-cored.exe
darwin-arm64/vocode-cored
linux-x64/vocode-cored
The extension resolves extension/bin/<platform-arch>/ first, with a fallback to legacy apps/core/bin/ if present.
Inside the Extension Development Host:
Open Command Palette:
Vocode: Start Voice
Vocode: Stop Voice
Vocode: Apply Edit
Vocode: Run Command
Supported today: deterministic single-file edits for insert statement "..." inside current function, replace block after "..." before "..." with "...", and append import "..." if missing. The core returns explicit success/failure/noop edit outcomes so the extension can display intent-preserving UX without another agent turn.
- JSON-RPC over stdio (in progress)
- polish RPC client surface (
src/daemon/*naming vsvocode-cored) - workspace sync
- edit intents → applier
- diff UI panel
- streaming speech input
See CONTRIBUTING.md for more information
- Install deps:
pnpm install - Generate protocol types:
pnpm codegen - Build core:
pnpm --filter @vocode/core build - Press
F5to run extension - Make changes
- Run:
pnpm lint:fix
- Do not commit:
- node_modules/
- .turbo/
- dist/
- bin/
- Core logs go to stderr
- Core stdout is reserved for JSON-RPC
TBD
Speak code. Watch it evolve. Stay in the flow.