voice-server

A WebSocket server library for building voice AI agents. It handles the full speech pipeline: receive audio from a client, transcribe it (STT), pass the text to your agent, synthesize the response (TTS), and stream the audio back.

How it works

Client → [binary audio chunks] → WebSocket → STT → AgentHandler → TTS → [WAV audio] → Client

Client streams raw audio as binary WebSocket frames, then sends { type: "audio_end" }.
Server transcribes the audio via a Speaches-compatible STT API.
The transcript and conversation history are passed to your AgentHandler.
The agent's text response is synthesized to WAV audio via a Speaches TTS API.
Audio is sent back to the client as base64-encoded chunks.

Installation

pnpm add @myeungdev/voice-server

Usage

import { startServer } from "@myeungdev/voice-server";
import type { AgentHandler } from "@myeungdev/voice-server";

const handler: AgentHandler = async (transcript, history) => {
  // Call your LLM, use history for context, return a response
  const text = `You said: ${transcript}`;
  return { text, updatedHistory: history };
};

startServer(3000, handler);

API

`startServer(port, handler, options?)`

Starts a WebSocket server on the given port.

Parameter	Type	Description
`port`	`number`	Port to listen on
`handler`	`AgentHandler`	Default handler called for each utterance
`options`	`ServerOptions`	Optional lifecycle hooks (see below)

`synthesize(text)`

Directly synthesize text to a WAV Buffer using the configured TTS service.

`AgentHandler`

type AgentHandler = (
  transcript: string,
  history: BaseMessage[],  // LangChain message history
) => Promise<{ text: string; updatedHistory: BaseMessage[] }>;

`ServerOptions`

interface ServerOptions {
  // Called on new connection. Return an AgentHandler to override the default for this session.
  onConnect?: (ws: WebSocket, session: Session) => Promise<AgentHandler | void>;
  // Called when a client disconnects.
  onDisconnect?: (session: Session) => Promise<void>;
}

WebSocket Protocol

Client → Server

Message	Description
Binary frame	Raw audio data (WAV)
`{ type: "audio_end" }`	Signals end of utterance, triggers processing

Server → Client

Message	Description
`{ type: "ready" }`	Server is ready to receive audio
`{ type: "processing" }`	Utterance received, transcription started
`{ type: "transcription", text }`	STT result
`{ type: "response_text", text }`	Agent text response
`{ type: "audio_start" }`	TTS audio stream begins
`{ type: "audio_chunk", data }`	Base64-encoded WAV chunk
`{ type: "audio_end" }`	TTS audio stream complete
`{ type: "error", message }`	Error during any stage

Configuration

All configuration is via environment variables.

Variable	Default	Description
`SPEACHES_BASE_URL`	`http://speaches:8000`	Base URL for both STT and TTS services
`STT_MODEL`	`Systran/faster-whisper-base.en`	Whisper model for transcription
`TTS_MODEL`	`speaches-ai/Kokoro-82M-v1.0-ONNX-int8`	Kokoro TTS model
`TTS_VOICE`	`af_heart`	TTS voice ID
`TTS_SPEED`	`1.0`	TTS playback speed multiplier

Dependencies

ws — WebSocket server
@langchain/core — Message history types

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
src		src
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voice-server

How it works

Installation

Usage

API

`startServer(port, handler, options?)`

`synthesize(text)`

`AgentHandler`

`ServerOptions`

WebSocket Protocol

Configuration

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

voice-server

How it works

Installation

Usage

API

startServer(port, handler, options?)

synthesize(text)

AgentHandler

ServerOptions

WebSocket Protocol

Configuration

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`startServer(port, handler, options?)`

`synthesize(text)`

`AgentHandler`

`ServerOptions`

Packages