This repository contains a RunPod handler for speaker diarization using PyAnnote. It takes audio input and returns diarization results with speaker segments and speaker embeddings.
- A Hugging Face account with access to the PyAnnote model
- RunPod account with API access
- Docker installed locally (for building the image)
Set the following environment variable when deploying:
HF_TOKEN: Your Hugging Face API token with access to the PyAnnote model
docker build -t your-username/diarization-handler:latest .
docker push your-username/diarization-handler:latest- Go to your RunPod Serverless dashboard
- Create a new endpoint using your Docker image
- Set the required environment variables
- Deploy your endpoint
Send a POST request to your RunPod endpoint with the following structure:
{
"input": {
"audio_data": "<base64_encoded_audio>",
"file_type": "wav"
}
}audio_data: Required. Base64-encoded audio file (wav, mp3, etc.)file_type: Optional. File format of the audio (default: "wav")
{
"diarization": [
{
"speaker": "SPEAKER_0",
"start": 0.0,
"end": 2.5
},
{
"speaker": "SPEAKER_1",
"start": 2.7,
"end": 5.2
}
],
"embeddings_dict": {
"SPEAKER_0": [0.1, 0.2, ...],
"SPEAKER_1": [0.3, 0.4, ...]
},
"processing_time": 3.45
}To test locally before deploying:
# Export your HF token
export HF_TOKEN="your_huggingface_token"
# Run the handler locally
python handler.pyYou can then test it with:
python test.pyimport requests
import base64
import os
# Read and encode audio file
with open("audio.wav", "rb") as audio_file:
audio_base64 = base64.b64encode(audio_file.read()).decode('utf-8')
# Set up headers with RunPod API key
headers = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {os.getenv("RUNPOD_API_KEY")}'
}
# Prepare the request payload
json_input = {
"input": {
"audio_data": audio_base64,
"file_type": "wav"
}
}
# Send the request to your RunPod endpoint
response = requests.post('https://api.runpod.ai/v2/your-endpoint-id/runsync',
headers=headers,
json=json_input)
# Process the response
result = response.json()