A generic test harness for voice agent workflows. Test agents from Retell, VAPI, LiveKit, Bland, Telnyx, and custom sources using a unified execution and evaluation model.
uv tool install voicetestOr add to a project (use uv run voicetest to run):
uv add voicetestOr with pip:
pip install voicetestTry voicetest with a sample healthcare receptionist agent and tests:
# Set up an API key (free, no credit card at https://console.groq.com)
export GROQ_API_KEY=gsk_...
# Load demo and start interactive shell
voicetest demo
# Or load demo and start web UI
voicetest demo --serveTip: If you have Claude Code installed, you can skip API key setup entirely and use
claudecode/sonnetas your model. See Claude Code Passthrough for details.
The demo includes a healthcare receptionist agent with 8 test cases covering appointment scheduling, identity verification, and more.
# Launch interactive shell (default)
uv run voicetest
# In the shell:
> agent tests/fixtures/retell/sample_config.json
> tests tests/fixtures/retell/sample_tests.json
> set agent_model ollama_chat/qwen2.5:0.5b
> run# List available importers
voicetest importers
# Run tests against an agent definition
voicetest run --agent agent.json --tests tests.json --all
# Export agent to different formats
voicetest export --agent agent.json --format mermaid # Diagram
voicetest export --agent agent.json --format livekit # Python code
voicetest export --agent agent.json --format retell-llm # Retell LLM JSON
voicetest export --agent agent.json --format retell-cf # Retell Conversation Flow JSON
voicetest export --agent agent.json --format vapi-assistant # VAPI Assistant JSON
voicetest export --agent agent.json --format vapi-squad # VAPI Squad JSON
voicetest export --agent agent.json --format bland # Bland AI JSON
voicetest export --agent agent.json --format telnyx # Telnyx AI JSON
voicetest export --agent agent.json --format voicetest # Voicetest JSON (.vt.json)
# Launch full TUI
voicetest tui --agent agent.json --tests tests.json
# Start REST API server with Web UI
voicetest serve
# Start infrastructure (LiveKit, Whisper, Kokoro) + backend for live calls
voicetest up
# Stop infrastructure services
voicetest downAn agent is represented as an AgentGraph: a directed graph of nodes connected by transitions. Each node has a prompt, a type, and outgoing edges that control conversation flow. The graph has a single entry_node_id where every conversation starts.
| Type | LLM Call | Speech | Routing |
|---|---|---|---|
| Conversation | Yes | Yes | LLM picks a transition via prompt match, or falls back to an always edge |
| Logic | No | No | Evaluates equations top-to-bottom; first match wins |
| Extract | Yes (extraction) | No | LLM extracts variables from the conversation, then equations route |
Conversation nodes are the standard building block — they generate a spoken response and use LLM judgment (or an always edge) to choose the next node.
Logic nodes (also called branch nodes) have no prompt and produce no speech. All their transitions use equation or always conditions, evaluated deterministically without an LLM call.
Extract nodes combine LLM extraction with deterministic routing. They define variables_to_extract (each with a name, description, type, and optional choices). The engine calls the LLM once to extract all variables from the conversation history, stores them as dynamic variables, then evaluates equation transitions using the extracted values.
Prompts can reference dynamic variables using {{variable_name}} syntax. Variables come from two sources:
- Test case
dynamic_variables: Set before the conversation starts (e.g.,{{caller_name}},{{account_id}}) - Extract node output: Populated during the conversation when an extract node fires
Expansion order: snippet references {%name%} are resolved first, then {{variable}} placeholders are substituted into the result. Unknown variables are left as-is.
Equation conditions on transitions support these operators:
| Operator | Example | Notes |
|---|---|---|
== |
status == "active" |
String equality |
!= |
tier != "free" |
String inequality |
> >= < <= |
age >= 18 |
Numeric coercion; non-numeric values return false |
contains |
notes contains "urgent" |
Substring match |
not_contains |
reply not_contains "err" |
Substring absence |
exists |
email exists |
Variable is set |
not_exist |
phone not_exist |
Variable is absent |
Multiple clauses combine with logical_operator: "and" (default, all must match) or "or" (any must match).
Test cases define simulated conversations to run against an agent:
[
{
"name": "Customer billing inquiry",
"user_prompt": "## Identity\nYour name is Jane.\n\n## Goal\nGet help with a charge on your bill.",
"metrics": ["Agent greeted the customer and addressed the billing concern"],
"dynamic_variables": {"caller_name": "Jane", "account_id": "12345"},
"tool_mocks": [],
"type": "simulation"
}
]type: "simulation"— The engine simulates both agent and user, running a full multi-turn conversationmetrics— LLM judges evaluate each metric against the transcript and produce a 0–1 scoredynamic_variables— Key-value pairs injected into{{var}}placeholders before the conversation starts
# Run tests against an agent definition
voicetest run --agent agent.json --tests tests.json --all
# Chat with an agent interactively
voicetest chat -a agent.json --model openai/gpt-4o --var name=Jane --var account=12345
# Evaluate a transcript against metrics (no simulation)
voicetest evaluate -t transcript.json -m "Agent was polite" -m "Agent resolved the issue"
# Diagnose test failures and suggest fixes
voicetest diagnose -a agent.json -t tests.json
voicetest diagnose -a agent.json -t tests.json --auto-fix --save fixed_agent.json
# Decompose an agent into sub-agents
voicetest decompose -a agent.json -o output/ [--num-agents N] [--model ID]| Command | Description |
|---|---|
voicetest agent list |
List agents in the database |
voicetest agent create -a agent.json --name "My Agent" |
Create an agent from a definition file |
voicetest agent get <agent-id> |
Get agent details |
voicetest agent update <agent-id> --name "Renamed" --model openai/gpt-4o |
Update agent properties |
voicetest agent delete <agent-id> |
Delete an agent |
voicetest agent graph <agent-id> |
Display agent graph structure |
| Command | Description |
|---|---|
voicetest test list <agent-id> |
List test cases for an agent |
voicetest test create <agent-id> -f test.json |
Create a test case from JSON |
voicetest test link <agent-id> tests.json |
Link external test file |
voicetest test unlink <agent-id> tests.json |
Unlink external test file |
voicetest test export <agent-id> |
Export test cases |
| Command | Description |
|---|---|
voicetest runs list <agent-id> |
List past test runs |
voicetest runs get <run-id> |
View run details with results |
voicetest runs delete <run-id> |
Delete a run |
# Analyze agent prompts for repeated text
voicetest snippet analyze --agent agent.json
# List defined snippets
voicetest snippet list --agent agent.json
# Create or update a snippet
voicetest snippet set --agent agent.json greeting "Hello, how can I help?"
# Apply snippets to prompts
voicetest snippet apply --agent agent.json --snippets '[{"name": "greeting", "text": "Hello!"}]'voicetest export --agent agent.json --format mermaid # Diagram
voicetest export --agent agent.json --format livekit # Python code
voicetest export --agent agent.json --format retell-llm # Retell LLM JSON
voicetest export --agent agent.json --format retell-cf # Retell Conversation Flow JSON
voicetest export --agent agent.json --format vapi-assistant # VAPI Assistant JSON
voicetest export --agent agent.json --format vapi-squad # VAPI Squad JSON
voicetest export --agent agent.json --format bland # Bland AI JSON
voicetest export --agent agent.json --format telnyx # Telnyx AI JSON
voicetest export --agent agent.json --format voicetest # Voicetest JSON (.vt.json)# Show current settings
voicetest settings
# Set a configuration value
voicetest settings --set models.agent=openai/gpt-4o
# List available platforms with configuration status
voicetest platforms
# Configure platform credentials
voicetest platform configure retell --api-key sk-xxx
# List agents on a remote platform
voicetest platform list-agents retell
# Import an agent from a platform
voicetest platform import retell <agent-id> -o imported.json
# Push a local agent to a platform
voicetest platform push retell -a agent.jsonAll commands support --json for machine-parseable output (progress goes to stderr):
voicetest --json agent list
voicetest --json run -a agent.json -t tests.json --all
voicetest --json snippet analyze --agent agent.jsonStart the server and open http://localhost:8000 in your browser:
voicetest serveThe web UI provides:
- Agent import and graph visualization
- Export agents to multiple formats (Mermaid, LiveKit, Retell, VAPI, Bland, Telnyx)
- Platform integration: import, push, and sync agents with Retell, VAPI, LiveKit, Telnyx
- Test case management with persistence
- Global metrics configuration (compliance checks that run on all tests)
- Test execution with real-time streaming transcripts
- Cancel in-progress tests
- Run history with detailed results and transcript inspection
- Audio evaluation with word-level diff of original vs. heard text
- Settings configuration (models, max turns, streaming, audio eval)
Data is persisted to .voicetest/data.duckdb (configurable via VOICETEST_DB_PATH).
The REST API is available at http://localhost:8000/api. Full API documentation is at voicetest.dev/api.
For live voice calls, you need infrastructure services (LiveKit, Whisper STT, Kokoro TTS). The up command starts them via Docker and then launches the backend:
# Start infrastructure + backend server
voicetest up
# Or start infrastructure only (e.g., to run the backend separately)
voicetest up --detach
# Stop infrastructure when done
voicetest downThis requires Docker with the compose plugin. The infrastructure services are:
| Service | URL | Description |
|---|---|---|
livekit |
ws://localhost:7880 | LiveKit server for real-time voice calls |
whisper |
http://localhost:8001 | Faster Whisper STT server |
kokoro |
http://localhost:8002 | Kokoro TTS server |
If you only need simulated tests (no live voice), voicetest serve is sufficient and does not require Docker.
voicetest can convert between agent formats via its unified AgentGraph representation:
Retell CF ─────┐ ┌───▶ Retell LLM
│ │
Retell LLM ────┼ ├───▶ Retell CF
│ │
VAPI ──────────┼ ├───▶ VAPI
│ │
Bland ─────────┼───▶ AgentGraph ──┼───▶ Bland
│ │
Telnyx ────────┤ ├───▶ Telnyx
│ │
LiveKit ───────┤ ├───▶ LiveKit
│ │
XLSForm ───────┤ ├───▶ Mermaid
│ │
Custom ────────┘ └───▶ Voicetest JSON
Import from any supported format, then export to any other:
# Convert Retell Conversation Flow to Retell LLM format
voicetest export --agent retell-cf-agent.json --format retell-llm > retell-llm-agent.json
# Convert VAPI assistant to Retell LLM format
voicetest export --agent vapi-assistant.json --format retell-llm > retell-agent.json
# Convert Retell LLM to VAPI format
voicetest export --agent retell-llm-agent.json --format vapi-assistant > vapi-agent.jsonvoicetest can connect directly to voice platforms to import and push agent configurations.
| Platform | Import | Push | Sync | API Key Env Var |
|---|---|---|---|---|
| Retell | ✓ | ✓ | ✓ | RETELL_API_KEY |
| VAPI | ✓ | ✓ | ✓ | VAPI_API_KEY |
| Bland | ✓ | ✓ | BLAND_API_KEY |
|
| Telnyx | ✓ | ✓ | ✓ | TELNYX_API_KEY |
| LiveKit | ✓ | ✓ | ✓ | LIVEKIT_API_KEY + LIVEKIT_API_SECRET |
In the Web UI, go to the "Platforms" tab to configure credentials, browse remote agents, import, push, and sync.
Agent prompts often repeat text across nodes — sign-off phrases, compliance disclaimers, tone instructions, etc. Snippets are named, reusable text blocks defined at the agent level and referenced in prompts via {%snippet_name%}.
| Snippet Name | Text |
|---|---|
sign_off |
"Thank you for calling. Is there anything else I can help with?" |
hipaa_warn |
"I need to verify your identity before sharing medical info." |
Use {%name%} (with percent signs) in any node prompt or general prompt:
Welcome the caller and introduce yourself.
{%hipaa_warn%}
When the conversation ends:
{%sign_off%}
Snippets are expanded before dynamic variables ({{var}}), so you can combine both:
Hello {{caller_name}}, {%greeting%}
Click Analyze DRY to scan all prompts for repeated or near-identical text. Exact matches can be auto-extracted into snippets; fuzzy matches (above 80%) are flagged for review.
When exporting, choose Raw (.vt.json) to preserve {%snippet%} references, or Expanded to resolve them to plain text for platform deployment.
Global metrics are compliance-style checks that run on every test for an agent. Configure them in the "Metrics" tab in the Web UI.
Each agent has:
- Pass threshold: Default score (0-1) required for metrics to pass (default: 0.7)
- Global metrics: List of criteria evaluated on every test run
Each global metric has:
- Name: Display name (e.g., "HIPAA Compliance")
- Criteria: What the LLM judge evaluates (e.g., "Agent must verify patient identity before sharing medical information")
- Threshold override: Optional per-metric threshold (uses agent default if not set)
- Enabled: Toggle to skip without deleting
Example use cases:
- HIPAA compliance checks for healthcare agents
- PCI-DSS validation for payment processing
- Brand voice consistency across all conversations
- Safety guardrails and content policy adherence
When a test fails, voicetest can diagnose the root cause and suggest concrete prompt changes to fix it.
- Diagnose — Click "Diagnose" on a failed result. The LLM analyzes the graph, transcript, and failed metrics to identify fault locations and root cause.
- Review & Edit — Proposed changes are shown as editable textareas. Modify the suggested text before applying.
- Apply & Test — Click "Apply & Test" to apply changes to a copy of the graph and rerun the test. A score comparison table shows original vs. new scores with deltas.
- Iterate — If not all metrics pass, click "Try Again" to revise the fix based on the latest results.
- Save — Click "Save Changes" to persist the fix to the agent graph.
Auto-Fix Mode runs an automated diagnose-apply-revise loop. Configure stop condition ("On improvement" or "When all pass") and max iterations (1–10, default 3).
Text-only evaluation has a blind spot: when an agent produces "415-555-1234", an LLM judge sees correct digits and passes. But TTS might speak it as "four hundred fifteen, five hundred fifty-five..." — which a caller can't use. Audio evaluation catches these issues by round-tripping agent messages through TTS→STT and judging what would actually be heard.
Conversation runs normally (text-only)
↓
Judges evaluate raw text → metric_results
↓
Agent messages → TTS → audio → STT → "heard" text
↓
Judges evaluate heard text → audio_metric_results
Both sets of results are stored. The original message text is preserved alongside what was heard, with a word-level diff shown in the UI.
Enable audio_eval in settings or toggle "Audio evaluation" in the Web UI. On-demand: click "Run audio eval" on any completed result.
Audio evaluation requires the TTS and STT services from voicetest up:
| Service | URL | Description |
|---|---|---|
whisper |
http://localhost:8001 | Faster Whisper STT |
kokoro |
http://localhost:8002 | Kokoro TTS |
Split a large agent into smaller, focused sub-agents:
voicetest decompose -a agent.json -o output/ [--num-agents N] [--model ID]Three-phase LLM pipeline: analyze the graph → refine the decomposition plan → build sub-agent JSON files. Produces one .json file per sub-agent plus a manifest.json with handoff rules and sub-agent registry.
Options:
--num-agents N— Target number of sub-agents (default: let the LLM decide)--model ID— LLM model override (defaults to judge model from settings)
DSPy LLM responses are cached to avoid redundant API calls. Default backend is local disk.
For shared caching across CI runners or team members, use the S3 backend:
# .voicetest/settings.toml
[cache]
cache_backend = "s3"
s3_bucket = "my-bucket"
s3_prefix = "dspy-cache/"
s3_region = "us-east-1"Disable caching for a run with no_cache = true in run options or --no-cache on the CLI.
Configure different models for each role using LiteLLM format:
from voicetest.models.test_case import RunOptions
options = RunOptions(
agent_model="openai/gpt-4o-mini",
simulator_model="gemini/gemini-1.5-flash",
judge_model="anthropic/claude-3-haiku-20240307",
max_turns=20,
)Or use Ollama for local execution:
options = RunOptions(
agent_model="ollama_chat/qwen2.5:0.5b",
simulator_model="ollama_chat/qwen2.5:0.5b",
judge_model="ollama_chat/qwen2.5:0.5b",
)In the shell:
> set agent_model gemini/gemini-1.5-flash
> set simulator_model ollama_chat/qwen2.5:0.5b
| Option | Default | Description |
|---|---|---|
max_turns |
50 |
Maximum conversation turns |
turn_timeout_seconds |
60.0 |
Per-turn timeout (user sim + agent response) |
no_cache |
false |
Bypass LLM response cache |
audio_eval |
false |
TTS→STT round-trip evaluation |
flow_judge |
false |
Validate conversation flow |
streaming |
false |
Stream tokens as LLM generates |
test_model_precedence |
false |
Test-level model overrides global model |
pattern_engine |
fnmatch |
Pattern matching engine: fnmatch or re2 |
Settings are stored in .voicetest/settings.toml:
[models]
agent = "groq/llama-3.1-8b-instant"
simulator = "groq/llama-3.1-8b-instant"
judge = "groq/llama-3.1-8b-instant"
[run]
max_turns = 20
audio_eval = false
streaming = false
[audio]
tts_url = "http://localhost:8002/v1"
stt_url = "http://localhost:8001/v1"
[cache]
cache_backend = "disk"If you have Claude Code installed, you can use it as your LLM backend without configuring API keys:
# .voicetest/settings.toml
[models]
agent = "claudecode/sonnet"
simulator = "claudecode/haiku"
judge = "claudecode/sonnet"Available model strings:
claudecode/sonnet— Claude Sonnetclaudecode/opus— Claude Opusclaudecode/haiku— Claude Haiku
This invokes the claude CLI via subprocess, using your existing Claude Code authentication.
voicetest ships with a Claude Code plugin for agent-assisted voice testing. Slash commands and auto-activating skills help Claude Code discover importers/exporters, run tests, export agents, and convert between formats.
For repo contributors (automatic):
Skills and commands load automatically from .claude/ (symlinked to claude-plugin/).
Install as a marketplace plugin:
/plugin marketplace add voicetestdev/voicetest
/plugin install voicetest@voicetest-plugins
For pip-installed users:
cd your-project
voicetest init-claudeAvailable slash commands:
/voicetest-run— Run tests against an agent/voicetest-export— Export agent to a different format/voicetest-convert— Convert between platform formats/voicetest-info— List importers, exporters, platforms, and settings
Plugin path (for manual plugin loading):
claude --plugin-dir $(voicetest claude-plugin-path)Run voice agent tests in CI to catch regressions before they reach production. Key benefits:
- Bring your own LLM keys — Use OpenRouter, OpenAI, etc. directly instead of paying per-minute through Retell/VAPI's built-in LLM interfaces
- Test on prompt changes — Automatically validate agent behavior when prompts or configs change
- Track quality over time — Ensure consistent agent performance across releases
Example GitHub Actions workflow (see docs/examples/ci-workflow.yml):
name: Voice Agent Tests
on:
push:
paths: ["agents/**"] # Trigger on agent config or test changes
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v5
- run: uv tool install voicetest
- run: voicetest run --agent agents/receptionist.json --tests agents/tests.json --all
env:
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}The ConversationEngine.advance() method traverses the agent graph from the current node until it produces speech or settles:
- Call
_process_node()on the current node - If the node produced a response (conversation node), return it
- If the node is silent (logic or extract node) and transitioned, follow the edge and repeat
- Maximum 20 hops per
advance()call to prevent infinite loops
Silent nodes auto-fire: logic nodes evaluate equations deterministically, extract nodes call the LLM once for variable extraction then evaluate equations. Tool messages record transitions and extractions in the transcript.
The project uses Punq for dependency injection. Key singletons:
Engine,sessionmaker,Session— SQLAlchemy database layer (DuckDB-backed)ImporterRegistry,ExporterRegistry,PlatformRegistry— Registries
Repositories are transient but share the singleton session:
AgentRepository,TestCaseRepository,RunRepository,CallRepository
Get instances via voicetest.container:
from voicetest.container import get_session, get_importer_registryWhen to use DI:
- Use
get_*helpers for app code (REST handlers, CLI commands) - Use
container.resolve(Type)when you need the container directly - For tests, use
reset_container()to get fresh state - Don't instantiate repositories directly; let Punq inject the session
When defining DSPy signatures, type the fields accurately:
class MySignature(dspy.Signature):
"""Docstring becomes the prompt context."""
input_text: str = dspy.InputField(desc="What this input contains")
count: int = dspy.InputField(desc="Numeric input")
result: str = dspy.OutputField(desc="What the LLM should produce")
score: float = dspy.OutputField(desc="Numeric score from 0.0 to 1.0")
items: list[str] = dspy.OutputField(desc="List of extracted items")
valid: bool = dspy.OutputField(desc="True/False judgment")The type annotations (str, int, float, bool, list[str]) guide the LLM's output format. The desc should clarify semantics, not just repeat the type.
Data persists to .voicetest/data.duckdb (or VOICETEST_DB_PATH env var).
Tests are scoped to agents via agent_id foreign key in test_cases table.
When importing Retell LLM format agents, terminal tools (end_call, transfer_call) are converted to proper CF node types during export rather than remaining as tools in the tools array:
end_calltools becometype=endnodes in the Conversation Flowtransfer_calltools becometype=transfer_callnodes withtransfer_destinationandtransfer_option- Tool metadata carries
transfer_destinationandtransfer_optionthrough the import/export pipeline - The agent envelope (voice_id, language, etc.) is preserved from LLM format through CF export for Retell UI re-import
All code must pass pre-commit checks before committing:
uv run pre-commit run --all-filesPre-commit Hooks:
- Python:
ruff(lint + auto-fix) andruff-format - TypeScript/Svelte:
svelte-checkandtsc --noEmit - Markdown:
mdformatwith GFM support - General: trailing-whitespace, end-of-file-fixer, check-yaml, check-json
Python Style (pyproject.toml):
- Line length: 100, target Python 3.12
- Imports must be at top of file (PLC0415 enforced)
- 2 blank lines after imports
- No relative parent imports (
from .. importbanned) - No unused imports/variables
TypeScript/Svelte Style:
- Strict mode enabled
- No unused locals or parameters
- No fallthrough in switch statements
Shared fixtures live in tests/fixtures/:
graphs/simple_graph.json— Basic agent graph for testingretell/— Retell format samples (including extract variable configs)vapi/— VAPI format sampleslivekit/— LiveKit format samples
Use fixtures via pytest:
def test_something(sample_graph_dict, fixtures_dir):
# sample_graph_dict is the parsed JSON
# fixtures_dir is Path to tests/fixtures/The easiest way to get a full development environment running is with Docker Compose:
# Clone and start all services
git clone https://github.com/voicetestdev/voicetest
cd voicetest
docker compose -f docker-compose.dev.yml upThe dev compose file includes the base infrastructure from voicetest/compose/docker-compose.yml (the same file bundled with the package for voicetest up) and adds backend + frontend services on top. This starts five services:
| Service | URL | Description |
|---|---|---|
livekit |
ws://localhost:7880 | LiveKit server for real-time voice calls |
whisper |
http://localhost:8001 | Faster Whisper STT server |
kokoro |
http://localhost:8002 | Kokoro TTS server |
backend |
http://localhost:8000 | FastAPI backend with hot reload |
frontend |
http://localhost:5173 | Vite dev server with hot reload |
Open http://localhost:5173 to access the web UI. Changes to Python or TypeScript files trigger automatic reloads.
Claude Code Authentication: The dev image includes Claude Code CLI. To authenticate for claudecode/* model passthrough:
docker compose -f docker-compose.dev.yml exec backend claude loginCredentials persist in the claude-auth Docker volume across container restarts.
Linked Agents: The compose file mounts your home directory ($HOME) read-only so linked agents with absolute paths work inside the container. On macOS, you may need to grant Docker Desktop access to your home directory in Settings → Resources → File Sharing.
To stop all services:
docker compose -f docker-compose.dev.yml downIf you prefer running services manually (e.g., for debugging):
# Clone and install
git clone https://github.com/voicetestdev/voicetest
cd voicetest
uv sync
# Run unit tests
uv run pytest tests/unit
# Run integration tests (requires Ollama with qwen2.5:0.5b)
uv run pytest tests/integration
# Lint
uv run ruff check voicetest/ tests/LiveKit integration tests require the lk CLI tool for agent deployment and listing operations. Install it from https://docs.livekit.io/home/cli/cli-setup/
# macOS
brew install livekit-cli
# Linux
curl -sSL https://get.livekit.io/cli | bashTests that require the CLI will skip automatically if it's not installed.
The web UI is built with Bun + Svelte + Vite. The recommended approach is to use Docker Compose (see above), which handles all services automatically.
For manual frontend development, uses mise for version management:
# Terminal 1 - Frontend dev server with hot reload
cd web
mise exec -- bun install
mise exec -- bun run dev # http://localhost:5173
# Terminal 2 - Backend API
uv run voicetest serve --reload # http://localhost:8000
# Terminal 3 - LiveKit server (for live voice calls)
docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp livekit/livekit-server --devThe Vite dev server proxies /api/* to the FastAPI backend.
# Run frontend tests
cd web && npx vitest run
# Build for production
cd web && mise exec -- bun run buildSvelte 5 reactivity guidelines are documented in web/README.md.
voicetest/
├── voicetest/ # Python package
│ ├── cli.py # CLI (40+ commands)
│ ├── rest.py # REST API server + WebSocket + SPA serving
│ ├── container.py # Dependency injection (Punq)
│ ├── cache.py # LLM response cache (disk + S3 backends)
│ ├── templating.py # Snippet and variable expansion
│ ├── services/ # Service layer (agents, diagnosis, evaluation, runs, snippets, etc.)
│ ├── compose/ # Bundled Docker Compose for infrastructure services
│ ├── models/ # Pydantic models (agent, test_case, results, decompose, etc.)
│ ├── importers/ # Source importers (retell, vapi, bland, telnyx, livekit, xlsform, custom)
│ ├── exporters/ # Format exporters (mermaid, livekit, retell, vapi, bland, telnyx, voicetest_ir)
│ ├── platforms/ # Platform SDK clients (retell, vapi, bland, telnyx, livekit)
│ ├── engine/ # Execution engine
│ │ ├── conversation.py # ConversationEngine: advance(), graph traversal
│ │ ├── equations.py # Deterministic equation evaluation
│ │ ├── modules.py # DSPy modules for state execution
│ │ ├── session.py # ConversationRunner for simulated tests
│ │ └── livekit_llm.py # LiveKit real-time integration
│ ├── simulator/ # User simulation
│ ├── judges/ # Evaluation judges (metric, rule, diagnosis)
│ ├── storage/ # DuckDB persistence layer
│ └── tui/ # TUI and shell
├── claude-plugin/ # Claude Code plugin (commands + skills)
│ ├── commands/ # Slash commands (/voicetest-run, etc.)
│ └── skills/ # Auto-activating skill + references
├── web/ # Frontend (Bun + Svelte + Vite)
│ ├── src/
│ │ ├── components/ # Svelte components
│ │ └── lib/ # API client, stores, types
│ └── dist/ # Built assets (bundled in package)
├── tests/
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests (Ollama)
└── docs/
Questions, feedback, or partnerships: hello@voicetest.dev
Apache 2.0


