PyAnnote Speaker Diarization RunPod Handler

This repository contains a RunPod handler for speaker diarization using PyAnnote. It takes audio input and returns diarization results with speaker segments and speaker embeddings.

Setup

Prerequisites

A Hugging Face account with access to the PyAnnote model
RunPod account with API access
Docker installed locally (for building the image)

Environment Variables

Set the following environment variable when deploying:

HF_TOKEN: Your Hugging Face API token with access to the PyAnnote model

Building the Docker Image

docker build -t your-username/diarization-handler:latest .
docker push your-username/diarization-handler:latest

Deploying on RunPod

Go to your RunPod Serverless dashboard
Create a new endpoint using your Docker image
Set the required environment variables
Deploy your endpoint

API Usage

Send a POST request to your RunPod endpoint with the following structure:

{
  "input": {
    "audio_data": "<base64_encoded_audio>",
    "file_type": "wav"
  }
}

Parameters

audio_data: Required. Base64-encoded audio file (wav, mp3, etc.)
file_type: Optional. File format of the audio (default: "wav")

Response

{
  "diarization": [
    {
      "speaker": "SPEAKER_0",
      "start": 0.0,
      "end": 2.5
    },
    {
      "speaker": "SPEAKER_1",
      "start": 2.7,
      "end": 5.2
    }
  ],
  "embeddings_dict": {
    "SPEAKER_0": [0.1, 0.2, ...],
    "SPEAKER_1": [0.3, 0.4, ...]
  },
  "processing_time": 3.45
}

Local Testing

To test locally before deploying:

# Export your HF token
export HF_TOKEN="your_huggingface_token"

# Run the handler locally
python handler.py

You can then test it with:

python test.py

Example Code

import requests
import base64
import os

# Read and encode audio file
with open("audio.wav", "rb") as audio_file:
    audio_base64 = base64.b64encode(audio_file.read()).decode('utf-8')

# Set up headers with RunPod API key
headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {os.getenv("RUNPOD_API_KEY")}'
}

# Prepare the request payload
json_input = {
    "input": {
        "audio_data": audio_base64,
        "file_type": "wav"
    }
}

# Send the request to your RunPod endpoint
response = requests.post('https://api.runpod.ai/v2/your-endpoint-id/runsync', 
                         headers=headers, 
                         json=json_input)

# Process the response
result = response.json()

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
diarization.pkl		diarization.pkl
handler.py		handler.py
readme.md		readme.md
requirements.txt		requirements.txt
test.mp3		test.mp3
test.py		test.py
test_dutch.mp3		test_dutch.mp3
test_input.json		test_input.json
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyAnnote Speaker Diarization RunPod Handler

Setup

Prerequisites

Environment Variables

Building the Docker Image

Deploying on RunPod

API Usage

Parameters

Response

Local Testing

Example Code

About

Uh oh!

Releases

Packages

Languages

Dembrane/diarization_runpod

Folders and files

Latest commit

History

Repository files navigation

PyAnnote Speaker Diarization RunPod Handler

Setup

Prerequisites

Environment Variables

Building the Docker Image

Deploying on RunPod

API Usage

Parameters

Response

Local Testing

Example Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages