Methods

A Speaker Diarization pipeline

INSTRUCTIONS FOR INFERENCING

- pip install -r requirements.txt
- cd VBx
- python predict.py --in-wav-path "YOUR_AUDIO_WAV_FILE_PATH"

NOTE : Currently this pipeline can't proccess overlapped speech

Methods

`pyannote_vad(token, path, wav_path)`

This function computes voice activity detection (VAD) on a .wav file.

Input Arguments:

token: Hugging Face token for accessing the pyannote segmentation 3.0 repository. Note that you need to request access to the VAD model API, as it is gated. Request access here.
path: The path where the VAD output file will be saved in .lab format.
wav_path: The path to the .wav file to be processed.

`predict(args, wav_path, vad_path, config)`

This function extracts x-vectors segment-wise by invoking the get_embedding method and then stores them in a .ark Kaldi-based file.

Input Arguments:

wav_path: The path to the .wav file to be processed.
vad_path: The path to the .lab file generated by the VAD process.
config: A dictionary containing custom-defined parameters.

`vbhmm_resegmentation(filename, config)`

This function performs clustering using Agglomerative Hierarchical Clustering (AHC), followed by resegmentation, and converts the labels to .rttm format.

Input Arguments:

filename: The name of the .wav file.
config: A dictionary containing custom-defined parameters.

More details about the full recipe in
F. Landini, J. Profant, M. Diez, L. Burget: Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks

If you are interested in the original version of VBx (prepared for the Second DIHARD Challenge), please refer to the corresponding branch.
If you are interested in the VBx recipe prepared for the track 4 of VoxSRC-20 Challenge (on VoxConverse), please refer to the corresponding branch.

Citations

In case of using the software please cite:
F. Landini, J. Profant, M. Diez, L. Burget: Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks (arXiv version)

@article{landini2022bayesian,
  title={Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks},
  author={Landini, Federico and Profant, J{\'a}n and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
  journal={Computer Speech \& Language},
  volume={71},
  pages={101254},
  year={2022},
  publisher={Elsevier}
}

@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
VBx		VBx
info		info
vad_benchmarking		vad_benchmarking
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Speaker Diarization pipeline

INSTRUCTIONS FOR INFERENCING

Methods

`pyannote_vad(token, path, wav_path)`

`predict(args, wav_path, vad_path, config)`

`vbhmm_resegmentation(filename, config)`

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Speaker Diarization pipeline

INSTRUCTIONS FOR INFERENCING

Methods

pyannote_vad(token, path, wav_path)

predict(args, wav_path, vad_path, config)

vbhmm_resegmentation(filename, config)

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`pyannote_vad(token, path, wav_path)`

`predict(args, wav_path, vad_path, config)`

`vbhmm_resegmentation(filename, config)`

Packages