Complete guide to the WebSocket Gateway service - the entry point for all client connections in the speech transcription microservices architecture.
The Gateway service acts as the central hub for the speech transcription system. It handles WebSocket client connections, processes audio streams, manages session state, and orchestrates the flow of data between clients and worker services.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Clients │────│ Gateway │────│ Redis │
│ (WebSocket) │ │ (WebSocket) │ │ (Streams + │
│ │ │ + Audio │ │ Pub/Sub) │
└─────────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────────────┐
│ Worker Services │
│ (STT + Translation) │
└─────────────────────┘
- WebSocket Server: Handles real-time bidirectional communication
- Dual VAD System: WebRTC + Silero for robust speech detection
- Session Management: Redis-backed session persistence for scaling
- Audio Processing: Real-time audio chunking and enhancement
- Flow Control: Prevents job flooding with per-client rate limiting
- Language Support: Dynamic source/target language configuration
- Event-Driven: Publishes to Redis Streams, subscribes to results
- Health Monitoring: Comprehensive metrics and health endpoints
- Manages individual client connections
- Processes incoming audio and text messages
- Handles client lifecycle (connect/disconnect/cleanup)
- Main orchestration logic
- Audio buffer management and speech detection
- Redis integration for messaging and session storage
- Flow control and job-in-flight tracking
- Audio enhancement and resampling
- Quality improvements for better transcription
- Format conversion and normalization
- Client session state persistence
- Language settings and buffer management
- Speech session lifecycle handling
- Dual voice activity detection
- WebRTC for fast detection, Silero for accuracy
- Speech state transitions and buffering
sequenceDiagram
participant Client
participant Gateway
participant Redis
Client->>Gateway: WebSocket Connect
Gateway->>Gateway: Generate client_id
Gateway->>Redis: Subscribe to results channel
Gateway-->>Client: Initial status message
Gateway->>Redis: Create/load session
sequenceDiagram
participant Client
participant Gateway
participant STT Worker
Client->>Gateway: Audio Chunk (binary)
Gateway->>Gateway: Audio enhancement
Gateway->>Gateway: VAD analysis
Gateway->>Gateway: Buffer management
Gateway->>Redis: Publish job (if speech detected)
STT Worker->>Redis: Process job
STT Worker->>Gateway: Publish result
Gateway-->>Client: Real-time transcription
stateDiagram-v2
[*] --> INACTIVE: Client connects
INACTIVE --> ACTIVE: Speech detected
ACTIVE --> ACTIVE: Continue buffering
ACTIVE --> SILENCE: No speech for threshold
SILENCE --> INACTIVE: Silence timeout
ACTIVE --> MAX_BUFFER: Buffer overflow
MAX_BUFFER --> INACTIVE: Force flush
| Variable | Default | Description |
|---|---|---|
GATEWAY_PORT |
5026 |
WebSocket server port |
HEALTH_PORT |
8080 |
HTTP health/metrics port |
REDIS_URL |
redis://localhost:6379 |
Redis connection URL |
LOG_LEVEL |
INFO |
Logging verbosity |
| Variable | Default | Description |
|---|---|---|
SAMPLE_RATE |
16000 |
Target audio sample rate |
BUFFER_SIZE |
4096 |
Audio processing buffer size |
PRE_SPEECH_BUFFER_SECONDS |
1.0 |
Pre-speech audio to retain |
| Variable | Default | Description |
|---|---|---|
WEBRTC_SENSITIVITY |
3 |
WebRTC VAD sensitivity (0-3) |
SILERO_SENSITIVITY |
0.7 |
Silero VAD sensitivity (0.0-1.0) |
SILENCE_THRESHOLD_SECONDS |
2.0 |
Silence before ending speech |
| Variable | Default | Description |
|---|---|---|
MAX_QUEUE_DEPTH |
100 |
Maximum Redis queue depth |
MAX_AUDIO_BUFFER_SECONDS |
30.0 |
Maximum audio buffer duration |
The Gateway uses two VAD systems for robust speech detection:
- Purpose: Real-time speech detection
- Sensitivity: 0-3 (higher = more sensitive)
- Pros: Low latency, low CPU usage
- Cons: Less accurate for quiet speech
- Purpose: High-accuracy speech detection
- Sensitivity: 0.0-1.0 (higher = more sensitive)
- Pros: Better accuracy, handles noise better
- Cons: Higher latency, more CPU intensive
# WebRTC for fast detection
if webrtc_vad.is_speech(audio_chunk):
# Silero for confirmation
if silero_vad.is_speech(audio_chunk):
activate_speech_buffer()Maintains a rolling buffer of audio before speech detection to capture words that start before VAD triggers.
# Rolling buffer: last N seconds of audio
pre_speech_buffer = collections.deque(maxlen=pre_speech_seconds * sample_rate)
# When speech detected, prepend buffer
speech_audio = pre_speech_buffer + current_audio| State | Description | Actions |
|---|---|---|
INACTIVE |
No speech detected | Monitor for speech start |
ACTIVE |
Speech in progress | Buffer audio, send jobs |
SILENCE |
Speech ended, waiting | Continue monitoring |
MAX_BUFFER |
Buffer overflow | Force final job send |
Prevents job flooding by tracking jobs-in-flight per client:
# Per-client flow control
job_in_flight[client_id] = False
# Only send new job if previous completed
if not job_in_flight[client_id]:
send_job_to_workers(audio_chunk)
job_in_flight[client_id] = True# Start with hot reload
cd backend/infra
make up
# View gateway logs
make logs
# Test WebSocket connection
curl -I http://localhost:8080/health# Use terminal client
cd backend/terminal_run
python live_transcription.py
# Check Redis streams
docker-compose exec redis redis-cli xlen audio_jobs# Monitor active connections
docker-compose logs -f gateway | grep "Client.*connected"
# Check buffer sizes
docker-compose logs -f gateway | grep "Buffer's current size"
# Monitor job flow
docker-compose logs -f gateway | grep "Sent job\|Received from"{
"status": "healthy",
"instance_id": "gateway-001",
"clients_connected": 3,
"jobs_processed": 150,
"redis_connected": true
}{
"clients_connected": 3,
"total_clients_served": 150,
"jobs_sent": 500,
"results_received": 495,
"audio_bytes_processed": 5242880,
"avg_processing_latency_ms": 45.2,
"errors": 2
}- Connection Count: Active WebSocket connections
- Job Throughput: Jobs sent/received per second
- Latency: Time from audio receipt to result delivery
- Buffer Health: Average buffer sizes and overflow events
- Error Rates: Failed connections, processing errors
# Scale gateway instances
docker-compose up --scale gateway=3
# Load balancer distributes connections
# Redis handles session sharing- Buffer size limits prevent memory exhaustion
- Session cleanup on client disconnect
- Efficient audio processing with numpy arrays
- Async processing prevents blocking
- Thread pool for audio enhancement
- Flow control prevents job queue backup
- Binary WebSocket messages for audio
- Compressed JSON for metadata
- Connection pooling for Redis
Symptoms: Slow transcription response
Solutions:
- Check Redis performance: docker stats
- Monitor buffer sizes: logs show "Buffer's current size"
- Scale STT workers: docker-compose up --scale stt_worker=3
Symptoms: Clients disconnecting frequently
Solutions:
- Check WebSocket ping/pong
- Monitor server resources
- Review error logs for connection issues
Symptoms: "Exceeded MAX_AUDIO_BUFFER_SECONDS"
Solutions:
- Increase MAX_AUDIO_BUFFER_SECONDS
- Check VAD sensitivity settings
- Scale gateway instances
Symptoms: "Redis connection failed"
Solutions:
- Check Redis health: docker-compose ps redis
- Verify REDIS_URL configuration
- Check network connectivity
Enable detailed logging:
export LOG_LEVEL=DEBUG
make restartKey debug messages:
[GATEWAY INITIALIZATION]- Configuration dumpVoice detected, activating buffer- Speech startBuffer's current size: X.XX- Buffer monitoringSent job (X.XX/Y.YYs)- Job publishingReceived from STT_WORKER- Result processing
# In websocket_handler.py
elif msg_type == "new_feature":
await self._handle_new_feature(data, client_id, websocket)# In audio_processor.py
def custom_enhancement(self, audio_data, sample_rate):
# Apply custom audio processing
return enhanced_audio# In vad.py
class CustomVAD:
def is_speech(self, audio_chunk):
# Implement custom VAD logic
return speech_detected- QUICK_START.md - Get running quickly
- CONFIGURATION.md - All configuration options
- API_REFERENCE.md - WebSocket message formats
- DEVELOPMENT_SETUP.md - Development environment
- SHARED_MODULES.md - Common utilities