- Overview
- Architecture at a glance
- Call flow
- Project layout
- Quick start
- Monitoring & observability
- Configuration reference
- Integration guides
- Operations & troubleshooting
- Development workflow
- CLI utilities & helper scripts
- Ports & deployment topology
- License
OpenAI SIP Voice Agent registers as a SIP endpoint via PJSIP, bridges audio between your PBX and OpenAI’s realtime or legacy voice APIs, and streams responses back to callers without leaving your telephony domain.
A built-in FastAPI monitor and React/Tailwind dashboard authenticate administrators, stream live call status, logs, and metrics over WebSockets, and surface runtime health while configuration stays in the .env file.
Structured JSON logging with correlation IDs, safe reload tooling, and a strongly typed environment schema make the agent production-friendly from day one.
- sip-agent – Asyncio PJSIP client that answers calls, bridges RTP frames into OpenAI’s realtime or legacy WebSocket APIs, and tracks token usage for observability.
- Monitor service – FastAPI app that validates
.envupdates, streams status, exposes metrics, and coordinates safe reloads once active calls drain. - Dashboard – React + Tailwind UI served by Nginx, connected to the monitor via an authenticated WebSocket for status and logs while configuration lives in
.env. - Observability layer – Structured logging with correlation IDs and an in-memory metrics collector that publishes JSON snapshots on
/metrics. - Container packaging – Multi-stage Docker build compiles the dashboard, installs PJSIP bindings, and is orchestrated via
docker-composefor both development and production deployments.
PBX / SIP phones ──(SIP/RTP)──► sip-agent (PJSIP + asyncio)
│
├── WebSocket ─► OpenAI realtime / legacy APIs
│
└── FastAPI monitor ──(REST + WebSocket)──► React dash (Nginx)
- Startup & configuration – The monitor boots first; if environment validation fails the agent exits with detailed errors. Setting
ENABLE_SIP=falseleaves the dashboard running without touching the PBX. - Registration – PJSIP is initialised with jitter-buffer preferences, STUN/TURN/SRTP knobs, codec priorities, and authenticates to your registrar with digest credentials.
- Call handling – Incoming and outbound calls create an
AudioCallbackbridge that feeds PCM frames into asyncio queues while retaining correlation IDs for monitoring. - Realtime streaming – When
OPENAI_MODE=realtime, the agent opens a WebSocket to OpenAI, issues asession.updatewith model/voice/audio parameters, base64-encodes audio deltas, and commits the buffer once a turn completes. - Legacy streaming – In legacy mode the agent forwards raw PCM frames to
/v1/audio/speechand queues returned audio directly to the SIP leg. - Metrics & logs – The monitor records call lifecycle events, retry counters, and aggregated token usage while structured logs carry correlation IDs end-to-end.
- Dashboard updates – Authenticated WebSocket subscribers receive status snapshots, call history, metrics, and log lines with automatic reconnect/backoff logic in the browser hook.
- Edge serving – Nginx serves the built dashboard assets and proxies
/api,/ws,/metrics,/healthz,/login, and/logoutback to the monitor for a cohesive admin surface.
app/– Backend agent, configuration loader, monitor API, and static dashboard assets.web/– React/Vite/Tailwind dashboard with hooks, components, tests, and build tooling.scripts/– Utilities such asinstall_pjsua2.pythat compile pjproject and Python bindings.deploy/– Deployment assets including the Nginx reverse-proxy configuration used in container builds.tests/– Pytest suite covering audio pipelines, monitor routes, configuration validation, and timers.Makefile– One-liner targets for installing dependencies, linting, typing, testing, formatting, and environment validation.
- Asterisk-based PBX (e.g., FreePBX or VICIdial) with an extension dedicated to the agent.
- Docker and Docker Compose.
- OpenAI API key with access to the
gpt-realtimemodel for realtime mode.
Clone the repo and copy the sample configuration:
git clone https://github.com/vaheed/sip-ai-agent.git
cd sip-ai-agent
cp env.example .envPopulate SIP credentials, OpenAI keys, and session preferences:
# PBX
SIP_DOMAIN=pbx.example.com
SIP_USER=1001
SIP_PASS=secret
# OpenAI
OPENAI_API_KEY=sk-...
AGENT_ID=pmpt_...
# Feature toggles & realtime session
ENABLE_SIP=true
ENABLE_AUDIO=true
OPENAI_MODE=realtime
OPENAI_MODEL=gpt-realtime
OPENAI_VOICE=alloy
OPENAI_TEMPERATURE=0.3
SYSTEM_PROMPT=You are a helpful voice assistant.
# Monitor authentication
MONITOR_ADMIN_USERNAME=admin
MONITOR_ADMIN_PASSWORD=admin
MONITOR_SESSION_SECRET=change-me
Core settings are validated on startup through a Pydantic schema, so missing or malformed values surface immediately in logs and the dashboard.
Check any .env file and regenerate env.example without touching your secrets:
python -m app.config validate --path .env
python -m app.config sample --write --path env.example
make env-validate # wraps the same validator
make env-sampleThe CLI accepts --include-os to merge process environment values during validation and prints detailed field-level errors on failure.
Build and run both containers locally:
docker compose up --buildOr pull the published images:
docker compose -f docker-compose.production.yml up -dsip-agent exposes UDP 5060 for SIP and 16000–16100 for RTP, while web serves the dashboard on TCP 8080 and proxies control-plane endpoints back to the agent.
For direct execution or CI, install dependencies and compile PJSIP bindings:
pip install -r requirements.txt
python scripts/install_pjsua2.py
# or
make installThe helper downloads pjproject 2.12, builds shared libraries, and installs the pjsua2 module—mirroring the container build sequence.
Browse to http://<docker-host>:8080 and log in with MONITOR_ADMIN_USERNAME / MONITOR_ADMIN_PASSWORD (defaults to admin / admin). Update SIP, OpenAI, and feature settings directly in the .env file and restart the containers to apply changes.
The dashboard summarises SIP registration state, active calls, token usage, realtime channel health, and live logs, with dark/light theme support and auto-refreshing WebSocket data feeds.
The browser hook handles authentication failures, reconnect backoff, and incremental log streaming so operators can keep a single page open during deployments.
Configuration validation runs on startup and via the CLI helpers so invalid .env values surface immediately in logs and the dashboard health summaries.
Structured JSON logs (with correlation IDs) and the metrics endpoint expose retry counters, call durations, token usage, and realtime WebSocket health. Pull /metrics in JSON for dashboards or quick curl | jq inspection during incidents.
| Key(s) | Default | Notes |
|---|---|---|
SIP_DOMAIN, SIP_USER, SIP_PASS |
(required) | PBX account details; empty values fail validation. |
OPENAI_API_KEY, AGENT_ID |
(required) | API key and assistant ID used for all conversations. |
ENABLE_SIP |
true |
Disable to run only the dashboard/monitor. |
ENABLE_AUDIO |
true |
Disable to keep SIP signalling without bridging RTP. |
OPENAI_MODE |
legacy |
Set realtime to enable ultra-low-latency streaming. |
OPENAI_MODEL |
gpt-realtime |
Model advertised in the realtime session update. |
OPENAI_VOICE |
alloy |
Voice name for realtime synthesis. |
OPENAI_TEMPERATURE |
0.3 |
Sampling temperature (0–2). |
SYSTEM_PROMPT |
You are a helpful voice assistant. |
Sent in each realtime session update. |
Defaults originate from env.example and are enforced via the Settings dataclass.
| Key | Default | Description |
|---|---|---|
SIP_TRANSPORT_PORT |
5060 |
Local UDP port for SIP signalling. |
SIP_PREFERRED_CODECS |
PCMU,PCMA,opus |
Priority-ordered codec list; friendly names (e.g. PCMU, opus) are normalised to the codec IDs reported by PJSIP and unavailable codecs are ignored. |
SIP_JB_MIN / SIP_JB_MAX / SIP_JB_MAX_PRE |
0 |
Jitter buffer bounds to trade latency for resilience. |
These values adjust PJSIP media configuration before registration.
| Key | Default | Description |
|---|---|---|
SIP_ENABLE_ICE |
false |
Enable ICE negotiation for symmetric NAT environments. |
SIP_ENABLE_TURN |
false |
Relay RTP via TURN when direct paths fail. |
SIP_STUN_SERVER |
(blank) | Optional STUN URI (e.g., stun:stun.l.google.com:19302). |
SIP_TURN_SERVER, SIP_TURN_USER, SIP_TURN_PASS |
(blank) | TURN relay credentials. |
SIP_ENABLE_SRTP |
false |
Negotiate SRTP; combine with SIP_SRTP_OPTIONAL=false to enforce encrypted media only. |
The agent applies these during account creation, mapping to PJSIP NAT and SRTP configuration structures.
| Key | Default | Description |
|---|---|---|
SIP_REG_RETRY_BASE / SIP_REG_RETRY_MAX |
2.0 / 60.0 |
Exponential backoff window for registration retries. |
SIP_INVITE_RETRY_BASE / SIP_INVITE_RETRY_MAX |
1.0 / 30.0 |
Retry cadence for outbound INVITEs on 4xx/5xx. |
SIP_INVITE_MAX_ATTEMPTS |
5 |
Maximum INVITE attempts before marking the call failed. |
Retry scheduling emits log events and increments metrics counters for visibility.
Monitor defaults can be overridden via environment variables:
MONITOR_ADMIN_USERNAME/MONITOR_ADMIN_PASSWORD– Admin credentials (defaultadmin/admin).MONITOR_SESSION_COOKIE– Cookie name for sessions (defaultmonitor_session).MONITOR_SESSION_TTL– Session lifetime in seconds (default 86400).MONITOR_SESSION_SECRET– Optional secret used to sign dashboard sessions (defaults to a value derived from the credentials).
- Create a PJSIP extension matching
SIP_USER/SIP_PASS. - Open UDP 16000–16100 on your firewall for RTP media.
- Add a dialplan entry (e.g., extension
5000) that dialsPJSIP/${SIP_USER}@${SIP_DOMAIN}and reload the dialplan. Dial5000from any internal phone to reach the assistant.
- Create a campaign or in-group that dials an internal extension (e.g.,
5000). - Add a matching dialplan stanza that calls
PJSIP/${SIP_USER}@${SIP_DOMAIN}. - Route transfers or manual dials to that extension; the agent behaves like any SIP endpoint registered to the PBX.
- SIP failure codes – Map common responses (401, 403, 404/484, 415/488, 480/503, 500/502) to corrective actions using the dashboard log stream and metrics counters such as
register_retriesandinvite_retries. - Firewall planning – Open UDP 5060 (or your configured port) plus 16000–16100 for RTP; disable SIP ALG where possible to avoid SDP rewriting.
- NAT & SRTP – Toggle ICE, TURN, STUN, and SRTP entirely through environment variables—no code changes required.
- Metrics & logs – Use
/metricsfor machine-readable health snapshots and rely on correlation IDs to trace a call across SIP events, WebSocket activity, and audio bridge state changes.
- Run
make dev,make lint,make type,make test, andmake formatto install toolchains, run Ruff, MyPy, Pytest, and formatting respectively; CI mirrors these commands. - The Python test suite covers realtime/legacy audio loops, monitor routes, configuration validation, and timer utilities—extend it when introducing new behaviours.
- Frontend development lives under
web/:npm install,npm run dev, andnpm run buildleverage Vite, Tailwind, ESLint, Vitest, and Playwright tooling defined inpackage.json. - Contribution guidelines, coding standards, and review expectations are documented in
CONTRIBUTING.md.
python -m app.config validate --path .env [--include-os]– Validate environment files against the schema with detailed field errors.python -m app.config sample --write --path env.example– Regenerate the canonical sample file (printable to stdout).make env-validate/make env-sample– Convenience wrappers around the same CLI.python scripts/install_pjsua2.py [--prefix PATH] [--force]– Download and build pjproject with shared libs and thepjsua2Python module (skips rebuilding if already installed unless--forceis provided).
sip-agentcontainer: UDP 5060 for SIP, UDP 16000–16100 for RTP, TCP 8080 for the monitor API/WebSocket.webcontainer: TCP 8080 (host) serving the compiled dashboard via Nginx, proxying control-plane traffic tosip-agent. Production compose files pull the published images by default but respectSIP_AGENT_IMAGE/SIP_DASHBOARD_IMAGEoverrides.- Nginx forwards
/api/,/ws/,/metrics,/healthz,/login,/logout, and/dashboardroutes to the monitor while serving static assets from/usr/share/nginx/html.
Released under the GNU General Public License v3.0.
_Source notes:_ Key behaviours—such as SIP registration, realtime bridging, environment validation, monitor routes, dashboard streaming, and deployment topology—are documented directly from `app/agent.py`, `app/config.py`, `app/monitor.py`, the React dashboard (`web/src`), the helper scripts, and deployment manifests cited above.