Vocode

Voice-driven AI code editing system powered by a local Go core (vocode-cored) and VS Code extension.

Vocode lets you speak code changes, and have them applied intelligently to your project using structured edits instead of raw text replacement.

🧠 What is Vocode?

Vocode is composed of three main parts:

VS Code Extension (TypeScript)

Captures voice + user intent
Displays UI (transcripts, diffs, status)
Sends requests to vocode-cored (JSON-RPC)

Core backend (Go, apps/core)

Runs locally
Handles:
- agent logic
- code edits (AST/diff-based)
- symbol resolution (host-provided LSP document symbols)
- command execution
- transcript agent-loop orchestration

Voice Sidecar (Go)

Runs locally as a separate process
Handles:
- native microphone capture
- STT provider integration
- transcript event emission to the extension

For now, these communicate over stdio (JSON-RPC). Maybe WebSocket in the future

🏗️ Repo Structure

apps/
  core/ # Go core engine (vocode-cored)
  voice/ # Go voice sidecar (mic + STT)
  vscode-extension/ # VS Code extension (UI + client)

packages/
  protocol/ # Shared schemas (Go + TS)
  prompts/ # LLM prompts

docs/
  architecture.md
  editing-model.md
  indexing.md
  protocol.md

scripts/
  dev/ # Build + dev scripts
  codegen/ # Protocol/code generation (future)

config/
  vocode.example.json

🚀 Getting Started

VS Code workflow: open Vocode → Settings (sidebar) and save your ElevenLabs API key (secret storage). Other knobs live under Settings → Vocode; defaults are defined in apps/vscode-extension/package.json. There is no committed .env — the extension does not load one for spawned core / voice processes.

Contributors / shell parity: see docs/vscode-settings-env.md for how each vocode.* setting maps to environment variables when spawning processes. For go run or manual binaries, export those names yourself and align values with package.json defaults where needed.
Install dependencies

corepack enable
pnpm install

Native voice dependencies (Windows only)

The voice sidecar (@vocode/voice) uses cgo + PortAudio, so on Windows you must install native deps before builds.

From the repo root:

pnpm setup-portaudio:win

If your MSYS2 install is not at C:\tools\msys64, pass your MSYS2 root:

pnpm setup-portaudio:win "D:\msys64"

Native voice dependencies (Linux only)

On Ubuntu/Debian, install PortAudio dev headers + pkg-config:

sudo apt-get update
sudo apt-get install -y pkg-config portaudio19-dev

Generate protocol types

pnpm codegen

Voice STT rollout/tuning

The voice sidecar uses ElevenLabs streaming STT with local VAD gating.

Recommended baseline (VS Code): use extension defaults, or set vocode.elevenLabsSttModelId / other vocode.* keys in Settings.

Tuning guide (env var names when running from a terminal, or for reading core code):

higher VOCODE_VOICE_VAD_END_MS -> fewer premature utterance commits
higher VOCODE_VOICE_VAD_PREROLL_MS -> less start-of-speech clipping
lower VOCODE_VOICE_STREAM_MIN_CHUNK_MS -> lower latency while speaking
higher VOCODE_VOICE_STREAM_MAX_CHUNK_MS -> fewer websocket chunk sends

Core transcript queueing:

the VS Code extension forwards committed transcripts to vocode-cored
the core processes transcripts in FIFO order and can coalesce multiple committed transcripts that arrive within VOCODE_DAEMON_VOICE_TRANSCRIPT_COALESCE_MS
queue + merge bounds are configurable via VOCODE_DAEMON_VOICE_TRANSCRIPT_QUEUE_SIZE, VOCODE_DAEMON_VOICE_TRANSCRIPT_MAX_MERGE_JOBS, and VOCODE_DAEMON_VOICE_TRANSCRIPT_MAX_MERGE_CHARS

Voice transcript (single-shot, core)

voice.transcript runs a narrow-model pipeline in apps/core/internal/transcript/ (pipeline, searchapply, …). It is not an iterative intents[] loop.

Per committed utterance: at most one transcript pipeline pass and one host.applyDirectives batch. On apply failure (e.g. stale_range), the core returns success: false; the user re-speaks.

voice.transcript returns VoiceTranscriptCompletion:

success and optional summary
transcriptOutcome (e.g. search, selection, selection_control, file_selection, file_selection_control, clarify, clarify_control, needs_workspace_folder, irrelevant, answer, completed)
uiDisposition (shown | skipped | hidden)
optional searchResults + activeSearchIndex, answerText, etc.

transcriptOutcome + uiDisposition quick guide:

search / selection / selection_control / file_selection / file_selection_control → usually hidden (panel flows)
clarify / clarify_control → hidden
irrelevant → skipped (or hidden during an active match-list session)
answer → hidden (Chat UI)
successful edit completion → shown

Transcript troubleshooting

When success is false, treat directives as invalid. Session tuning uses vocode.sessionIdleResetMs and daemonConfig on the RPC (maxGatheredBytes, maxGatheredExcerpts).

Build the core backend (vocode-cored)

pnpm --filter @vocode/core build

This creates:

apps/vscode-extension/bin/<platform-arch>/vocode-cored(.exe)

Run the extension Press:

F5

This launches a VS Code Extension Development Host.

You should see logs like:

Vocode extension activated
[vocode] using dev binary path (log label may still say daemon): ...
[vocode-cored stderr] vocode-cored starting...

⚙️ Development Workflow

Build everything

pnpm build

Run linting

pnpm lint

Auto-fix formatting

pnpm lint:fix

Run Go tests

go test ./...

🧩 Architecture Overview

VS Code Extension
├── commands/
├── daemon/ (spawn + JSON-RPC client for vocode-cored; historical folder name)
├── directives/ (host apply for protocol directives)
├── voice-transcript/ (voice.transcript RPC + apply batch)
├── ui/
│   └── panel/ (main webview + store)
└── voice/ (sidecar client + spawn)

        ↓ stdio (JSON-RPC)

Go core (`apps/core`)
├── rpc/
├── agent/
├── flows/ (route classification + per-flow handlers)
├── workspace/
├── search/ (e.g. ripgrep-backed search)
└── transcript/ (service, pipeline, session, searchapply, ...)

Voice Sidecar
├── app/ (stdio protocol loop)
├── mic/ (native capture)
└── stt/ (provider adapters)

🔑 Key Design Principles

1. Structured edits only

We never blindly rewrite files.

All edits are:

orchestrated in the core
anchored
validated
diffed before apply

The current implementation intentionally supports a small deterministic slice instead of pretending the agent covers all edit styles.

2. Core-first architecture

All intelligence lives in vocode-cored.

The extension is just:

input/output
UI
transport

3. Local-first (ultimately, but will use elevenlabs cloud service for the hackathon)

No cloud dependency required
Works offline (future: whisper.cpp)
Fast + private

4. Streaming everything

voice → streaming STT
edits → incremental intent iteration (currently rule-based for a small safe slice)
UI → live feedback

🛠️ Current Status

✅ Extension boots
✅ Core (vocode-cored) spawns
✅ Cross-platform core build
✅ JSON-RPC over stdio (vocode-cored + extension)
🚧 Rich edit engine wiring beyond the initial safe slice
🚧 Voice pipeline

📦 Core binary build

vocode-cored is built per platform into the VS Code extension package (same path the packaged VSIX uses):

apps/vscode-extension/bin/
  win32-x64/vocode-cored.exe
  darwin-arm64/vocode-cored
  linux-x64/vocode-cored

The extension resolves extension/bin/<platform-arch>/ first, with a fallback to legacy apps/core/bin/ if present.

🧪 Testing the Extension

Inside the Extension Development Host:

Open Command Palette:

Vocode: Start Voice
Vocode: Stop Voice
Vocode: Apply Edit
Vocode: Run Command

Supported today: deterministic single-file edits for insert statement "..." inside current function, replace block after "..." before "..." with "...", and append import "..." if missing. The core returns explicit success/failure/noop edit outcomes so the extension can display intent-preserving UX without another agent turn.

🧱 Roadmap (Short-Term)

JSON-RPC over stdio (in progress)
polish RPC client surface (src/daemon/* naming vs vocode-cored)
workspace sync
edit intents → applier
diff UI panel
streaming speech input

🧑‍💻 Contributing

See CONTRIBUTING.md for more information

Install deps: pnpm install
Generate protocol types: pnpm codegen
Build core: pnpm --filter @vocode/core build
Press F5 to run extension
Make changes
Run:

pnpm lint:fix

⚠️ Notes

Do not commit:
- node_modules/
- .turbo/
- dist/
- bin/
Core logs go to stderr
Core stdout is reserved for JSON-RPC

📄 License

TBD

🧠 Vision

Speak code. Watch it evolve. Stay in the flow.

back to top

Name		Name	Last commit message	Last commit date
Latest commit History 281 Commits
.github/workflows		.github/workflows
.vscode		.vscode
apps		apps
config		config
docs		docs
packages		packages
scripts		scripts
third-party-research		third-party-research
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
go.mod		go.mod
go.sum		go.sum
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tmp_omit.go		tmp_omit.go
turbo.json		turbo.json
vocode.code-workspace		vocode.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vocode

🧠 What is Vocode?

🏗️ Repo Structure

🚀 Getting Started

Native voice dependencies (Windows only)

Native voice dependencies (Linux only)

Voice STT rollout/tuning

Voice transcript (single-shot, core)

Transcript troubleshooting

⚙️ Development Workflow

Build everything

Run linting

Auto-fix formatting

Run Go tests

🧩 Architecture Overview

🔑 Key Design Principles

1. Structured edits only

2. Core-first architecture

3. Local-first (ultimately, but will use elevenlabs cloud service for the hackathon)

4. Streaming everything

🛠️ Current Status

📦 Core binary build

🧪 Testing the Extension

🧱 Roadmap (Short-Term)

🧑‍💻 Contributing

⚠️ Notes

📄 License

🧠 Vision

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vocode

🧠 What is Vocode?

🏗️ Repo Structure

🚀 Getting Started

Native voice dependencies (Windows only)

Native voice dependencies (Linux only)

Voice STT rollout/tuning

Voice transcript (single-shot, core)

Transcript troubleshooting

⚙️ Development Workflow

Build everything

Run linting

Auto-fix formatting

Run Go tests

🧩 Architecture Overview

🔑 Key Design Principles

1. Structured edits only

2. Core-first architecture

3. Local-first (ultimately, but will use elevenlabs cloud service for the hackathon)

4. Streaming everything

🛠️ Current Status

📦 Core binary build

🧪 Testing the Extension

🧱 Roadmap (Short-Term)

🧑‍💻 Contributing

⚠️ Notes

📄 License

🧠 Vision

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages