pyannote-audio-elan provides access to voice activity detection and overlap-aware speaker diarization services provided by pyannote.audio (Plaquet & Bredin 2023, Bredin 2023) from directly inside ELAN. This allows users to apply both out-of-the-box and fine-tuned segmentation/diarization models to multimedia sources linked to ELAN transcripts from directly within ELAN's user interface.
In addition to performing voice activity detection and speaker diarization, pyannote-audio-elan can optionally apply speaker verification to the results of these segmentation processes, attempting to determine the most likely match between a known set of speakers' voices and the speaker(s) identified during automatic segmentation. When given a set of speaker identifiers (e.g., "CDC", "BRS", etc.) and short audio samples (e.g., 30 seconds of speech) from each corresponding individual, pyannote-audio-elan will return tiers from the segmentation process with names that contain the speaker identifier (e.g., returning a tier named "CDC", rather than "PyannoteAudio_Speaker_00", if the audio sample provided for the speaker "CDC" is the closest match to the speaker Speaker_00 identified in automatic segmentation).
pyannote-audio-elan makes use of several of other open-source applications and utilities:
pyannote-audio-elan is written in Python 3, and also depends on a number of Python packages that can be installed using pip in a virtual environment. Under macOS 15, the following commands can be used to fetch and install the necessary Python packages:
git clone https://github.com/coxchristopher/pyannote-audio-elan
cd pyannote-audio-elan
python3.10 -m venv venv-pyannote-audio-elan
source venv-pyannote-audio-elan/bin/activate
pip install -r requirements.txt
chmod +x pyannote-audio-elan.sh
Once all of these tools and packages have been installed, pyannote-audio-elan can be made available to ELAN as follows:
-
Edit the file
pyannote-audio-elan.sh(macOS) orpyannote-audio-elan.bat(Windows) to specify a Unicode-friendly language and locale (ifen_US.UTF-8isn't available on your computer). -
To make pyannote-audio-elan available to ELAN, move your pyannote-audio-elan directory into ELAN's
extensionsdirectory. This directory is found in different places under different operating systems:- Under macOS, right-click on
ELAN_7.0in your/Applicationsfolder and select "Show Package Contents", then copy yourpyannote-audio-elanfolder intoELAN_7.0.app/Contents/app/extensions. - Under Linux, copy your
pyannote-audio-elanfolder intoELAN_7-0/app/extensions. - Under Windows, copy your
pyannote-audio-elanfolder intoC:\Users\AppData\Local\ELAN_7-0\app\extensions.
- Under macOS, right-click on
Once ELAN is restarted, it will now include two new options in the list of services found under the 'Recognizer' tab in Annotation Mode: 'pyannote.audio speaker diarization with speaker verification' (for overlap-aware speaker diarization with optional speaker verification) and 'pyannote.audio voice activity detection' (for basic voice activity detection, without any speaker diarization applied). The user interfaces for both recognizers allow users to enter the settings needed to apply speaker diarization media linked to this ELAN transcript (e.g., optionally specifying the exact number of speakers that are present in this recording, if known, or a maximum number of speakers that may be present, which can improve the accuracy of speaker diarization).
Once these settings have been entered in pyannote-audio-elan, pressing the Start button will begin applying the selected segmentation service to the media. Once that process is complete, if no errors occurred, ELAN will allow the user to load the resulting tier(s) with the automatically recognized segments into the current transcript.
Importantly, pyannote-audio-elan currently requires access to pyannote.audio's segmentation and speaker-diarization pipelines on Hugging Face. Both of these pipelines require users to read and accept their conditions on Hugging Face before using them, and to provide a Hugging Face access token to download them the first time they are used. While we hope to be able to offer a version of pyannote-audio-elan in the future that removes this requirement and works entirely offline, for now, users of pyannote-audio-elan will need to:
- Accept the pyannote/segmentation-3.0 user conditions,
- Accept the speaker-diarization-3.1 user conditions, then
- Create an access token at https://hf.co/settings/tokens that can be copied into the pyannote-audio-elan settings.
This is an alpha release of pyannote-audio-elan, and has only been tested under macOS (13-15) and Windows (11, Intel 64-bit) with Python 3.10. No support for Linux is included in this version.
As noted above, installing and using pyannote-audio-elan currently requires an internet connection (at least the first time that pyannote-audio-elan is used, so that the segmentation and diarization pipelines can be downloaded from Hugging Face) and some familiarity with command-line software development tools. We hope to reduce (and, ideally, eliminate) these requirements in the future, providing pre-packaged, offline-friendly versions of these recognizers that offer more user-friendly installation options (see the pyannote.audio tutorial on offline speaker diarization and Lorena Martín Rodríguez's SileroVAD-Elan project for examples of how this might be done).
Thanks are due to the developers of pyannote.audio for the pipelines that this recognizer relies upon and the accompanying documentation, particularly of the fine-tuning process. Thanks, as well, to Han Sloetjes for his help with issues related to ELAN's local recognizer specifications.
If referring to this code in a publication, please consider using the following citation:
Cox, Christopher. 2025. pyannote-audio-elan: An implementation of pyannote.audio speaker diarization and voice activity detection services as a recognizer for ELAN. Version 0.2.0.
@manual{cox25pyannoteaudioelan,
title = {pyannote-audio-elan: An implementation of pyannote.audio speaker diarization and voice activity detection services as a recognizer for {ELAN}.},
author = {Christopher Cox},
year = {2025}
note = {Version 0.2.0},
}