This repository contains WebSocket client samples for the Voxist ASR (Automatic Speech Recognition) service in both JavaScript and Python.
Go there.
Contact us if interested.
The scripts support both staging and production environments:
- Production:
api-asr.voxist.com(default) - Staging:
asr-staging-dev.voxist.com(with--stagingflag)
Important: Staging and production environments use different API keys. Make sure to use the correct API key for your target environment.
There are two ways to connect to the WebSocket:
wss://api-asr.voxist.com/ws?api_key=YOUR_API_KEY&lang=fr-medical&sample_rate=16000
- Request a temporary token:
curl -X 'GET' \
'https://api-asr.voxist.com/websocket?engine=voxist-rt-2' \
-H 'accept: application/json' \
-H 'X-LVL-KEY: YOUR_API_KEY'- Response:
{
"url": "wss://api-asr.voxist.com/ws?token=JWT_TOKEN"
}- Add parameters to the URL:
wss://api-asr.voxist.com/ws?token=JWT_TOKEN&lang=fr-medical&sample_rate=16000
Send raw audio data directly to the WebSocket:
- Format: Raw PCM audio bytes
- Encoding: Signed 16-bit little-endian
- Channels: Mono (1 channel)
- Sample Rate: 8000 Hz or 16000 Hz (specified in connection URL)
- Chunk Size: Recommended 100ms chunks (3200 bytes for 16kHz, 1600 bytes for 8kHz)
For optimal real-time performance:
- Timing: Send approximately 1 second of audio per second
- Chunk Interval: 100ms chunks sent every 100ms
- Buffer Management: Avoid buffering large amounts of audio
- Network Latency: Account for network delays in your timing
Example timing for 16kHz audio:
const SAMPLE_RATE = 16000;
const BYTES_PER_SAMPLE = 2;
const CHUNK_DURATION_MS = 100;
const CHUNK_SIZE = SAMPLE_RATE * BYTES_PER_SAMPLE * (CHUNK_DURATION_MS / 1000); // 3200 bytes
// Send chunk every 100ms
setInterval(() => {
const audioChunk = getAudioChunk(CHUNK_SIZE);
websocket.send(audioChunk);
}, CHUNK_DURATION_MS);To signal the end of audio and complete the transcription:
{"eof": 1}Send this JSON message when you finish sending audio data.
The WebSocket returns JSON messages with transcription results. Both partial and final results have the same format, only the type field differs:
{
"text": " Ceci est un te",
"type": "partial",
"startedAt": 0,
"segment": 0,
"elements": {
"segments": [
{
"text": " Ceci est un te",
"type": "segment",
"startedAt": 0,
"segment": 0
}
],
"words": [
{
"text": "Ceci",
"type": "word",
"startedAt": 1.28,
"segment": 0
},
{
"text": "est",
"type": "word",
"startedAt": 1.8,
"segment": 0
},
{
"text": "un",
"type": "word",
"startedAt": 2.04,
"segment": 0
},
{
"text": "te",
"type": "word",
"startedAt": 2.32,
"segment": 0
}
]
}
}{
"text": " Ceci est un test",
"type": "final",
"startedAt": 0,
"segment": 0,
"elements": {
"segments": [
{
"text": " Ceci est un test",
"type": "segment",
"startedAt": 0,
"segment": 0
}
],
"words": [
{
"text": "Ceci",
"type": "word",
"startedAt": 1.28,
"segment": 0
},
{
"text": "est",
"type": "word",
"startedAt": 1.8,
"segment": 0
},
{
"text": "un",
"type": "word",
"startedAt": 2.04,
"segment": 0
},
{
"text": "test",
"type": "word",
"startedAt": 2.32,
"segment": 0
}
]
}
}text: The transcribed texttype:"partial"for real-time updates,"final"for completed segmentsstartedAt: Start time of the segment in secondssegment: Segment number (increments for each completed phrase/sentence)elements: Detailed breakdown with word-level timingsegments: Array of text segments with timingwords: Array of individual words with precise timestamps
Note: The only difference between partial and final results is the type field. Partial results may have incomplete words (e.g., "te" instead of "test"), while final results contain the complete, corrected transcription.
- Connect to WebSocket with API key or token
- Stream audio in real-time chunks (100ms recommended)
- Receive partial results for immediate feedback
- Receive final results for completed segments with detailed timing
- Send EOF when finished
- Close connection
- Authentication errors: Check API key validity and permissions
- Connection errors: Verify network connectivity and URL format
- Audio format errors: Ensure correct sample rate and audio format
- Token expiry: Temporary tokens expire after 1 hour
The microphone script (asr-mic.js) requires SoX to be installed and available in your $PATH.
sudo apt-get install sox libsox-fmt-allbrew install soxNote: SoX is only required for the microphone script (asr-mic.js). The file-based scripts (asr-file-ws.js and asr-file-ws.py) do not require SoX.
npm installDirect WebSocket connection using API key authentication with CLI parameters:
node asr-file-ws.js <API_KEY> <WAV_FILE> [LANG] [SAMPLE_RATE] [--staging]Examples:
# Production environment (default)
node asr-file-ws.js your-prod-api-key audio.wav fr-medical 16000
# Staging environment
node asr-file-ws.js your-staging-api-key audio.wav fr-medical 16000 --staging
# Custom language and sample rate in production
node asr-file-ws.js your-prod-api-key audio.wav fr 8000
# English transcription in staging
node asr-file-ws.js your-staging-api-key audio.wav en 16000 --stagingParameters:
API_KEY: Your Voxist API key (different for staging and production)WAV_FILE: Path to the WAV audio fileLANG: Language code (optional, default: "fr-medical")SAMPLE_RATE: Sample rate in Hz (optional, default: 16000)--staging: Use staging environment (optional)
Real-time microphone transcription using WebSocket with temporary token authentication:
node asr-mic.js <API_KEY> [LANG] [SAMPLE_RATE] [--staging]Examples:
# Production environment (default)
node asr-mic.js your-prod-api-key fr-medical 16000
# Staging environment
node asr-mic.js your-staging-api-key fr-medical 16000 --staging
# Custom language and sample rate in production
node asr-mic.js your-prod-api-key fr 8000
# English transcription in staging
node asr-mic.js your-staging-api-key en 16000 --stagingParameters:
API_KEY: Your Voxist API key (different for staging and production)LANG: Language code (optional, default: "fr-medical")SAMPLE_RATE: Sample rate in Hz (optional, default: 16000)--staging: Use staging environment (optional)
Features:
- Real-time microphone recording and transcription
- Automatic audio configuration (mono 16-bit at specified sample rate)
- Temporary token authentication (more secure than direct API key in WebSocket)
- Live partial results with
[LIVE]prefix - Final results with
[FINAL]prefix - Graceful shutdown with Ctrl+C
- Proper microphone resource cleanup
Requirements:
- SoX must be installed and available in PATH
- Working microphone
- Microphone permissions granted to terminal/application
How it works:
- Requests a temporary WebSocket token from the API using your API key
- Adds language and sample rate parameters to the WebSocket URL
- Connects to the WebSocket using the temporary token
- Streams microphone audio in real-time
Run the setup script to create a virtual environment and install dependencies:
./setup-python.sh- Create a virtual environment:
python3 -m venv venv- Activate the virtual environment:
source venv/bin/activate # On Linux/Mac
# or
venv\Scripts\activate # On Windows- Install dependencies:
pip install -r requirements.txtpython asr-file-ws.py <API_KEY> <WAV_FILE> [LANG] [SAMPLE_RATE] [--staging]Examples:
# Production environment (default)
python asr-file-ws.py your-prod-api-key audio.wav fr-medical 16000
# Staging environment
python asr-file-ws.py your-staging-api-key audio.wav fr-medical 16000 --staging
# Custom language and sample rate in production
python asr-file-ws.py your-prod-api-key audio.wav fr 8000
# English transcription in staging
python asr-file-ws.py your-staging-api-key audio.wav en 16000 --stagingParameters:
API_KEY: Your Voxist API key (different for staging and production)WAV_FILE: Path to the WAV audio fileLANG: Language code (optional, default: "fr-medical")SAMPLE_RATE: Sample rate in Hz (optional, default: 16000)--staging: Use staging environment (optional)
Supported Languages:
fr: Frenchfr-medical: French Medicalen: Englishpt: Portuguesenl: Dutchit: Italiansv: Swedishes: Spanishde: German
- Format: WAV
- Sample Rate: 8000 Hz or 16000 Hz
- Channels: Mono (1 channel)
- Bit Depth: 16-bit
- Automatically configured to mono 16-bit at specified sample rate
- SoX handles audio format conversion
- Works with any microphone supported by the system
Important: You need different API keys for staging and production environments:
- Production API Keys: Used with
api-asr.voxist.com(default behavior) - Staging API Keys: Used with
asr-staging-dev.voxist.com(with--stagingflag)
Contact Voxist support to obtain API keys for both environments.