Welcome to the Voice Privacy Challenge! Your task is to develop a model that anonymizes audio while preserving intelligibility and naturalness. This repository provides the necessary setup, evaluation script, and rules for participation.
evaluation_data/ # Directory containing enrollment and trial audio data
βββ Enrollment/ # Speaker audio files for enrollment
β βββ speaker1/ # Directory for Speaker 1
β β βββ 1272-128104-0000.wav # Original enrollment utterance
β β βββ ...
β β βββ anonymized/ # Anonymized versions of the above audio files (Will be automatically created based on your anonymization algorithm when the evaluation script is run)
β β βββ anon_1272-128104-0000.wav
β β βββ ...
β βββ speaker2/
β βββ speaker3/
β βββ speaker4/
β βββ ...
β
βββ Trial/ # Speaker audio files for testing (trial phase)
β βββ speaker1/
β β βββ 1272-128104-0003.wav # Trial utterances (different from enrollment)
β β βββ ...
β β βββ anonymized/ # Anonymized versions of the above audio files (Will be automatically created based on your anonymization algorithm when the evaluation script is run)
β β βββ anon_1272-128104-0003.wav
β β βββ ...
β βββ speaker2/
β βββ speaker3/
β βββ speaker4/
β βββ ...
β
parameters/ # Directory to store model parameters (participants should add their own)
evaluation.py # DO NOT MODIFY - Evaluates your model and generates results.csv
model.py # MODIFY - Implement your anonymization model here
README.md # This file - contains all competition instructions
requirements.txt # MODIFY - List your dependencies here
run.sh # DO NOT MODIFY - Runs the evaluation script
In this challenge, participants work with enrollment and trial utterances, which follow a structure similar to speaker verification tasks.
-
Enrollment Utterances (Stored in
Enrollment/):- These are speech recordings associated with a particular speaker.
- Each speaker has multiple enrollment utterances, which serve as reference data.
- The anonymization system must ensure that any transformed enrollment utterance still preserves the necessary speech characteristics, except for the speaker's identity.
-
Trial Utterances (Stored in
Trial/):- These are new speech recordings from the same speakers but contain different utterances.
- These utterances are anonymized and later compared against enrollment utterances.
- The anonymization system must ensure that the same speaker's trial utterances still match their anonymized enrollment utterances while preventing identification of the original speaker.
- Each speaker in Enrollment and Trial is the same, meaning
speaker1inEnrollment/is the same asspeaker1inTrial/, but their audio files differ. - The anonymized versions of a speakerβs trial utterances must match the anonymized version of their enrollment utterances, maintaining consistency in the "pseudo-speaker" identity.
- The anonymization system should not alter linguistic content but should make it impossible to link the anonymized voice back to the original speaker.
Before cloning, you need to fork this repository to your own GitHub account. Follow these steps:
- Navigate to the repository on GitHub.
- In the top-right corner, click the Fork button.
- This creates a copy of the repository under your GitHub account.
Once you've forked the repository, clone it to your local machine:
# Replace <YOUR_GITHUB_USERNAME> with your actual GitHub username
git clone https://github.com/<YOUR_GITHUB_USERNAME>/VPC25.git
cd VPC25This ensures you're working on your own version of the repository while still being able to pull updates from the original source.
This project requires Python 3.12. Ensure you have it installed before proceeding.
python3 --versionor on Windows (PowerShell):
python --versionIf you don't have Python 3.12, download it from python.org.
To process audio files, FFmpeg must be installed. Follow these steps based on your system:
sudo apt update && sudo apt install ffmpegbrew install ffmpeg- Download FFmpeg from ffmpeg.org (recommended: Windows build from gyan.dev).
- Extract it to a folder (e.g.,
C:\ffmpeg). - Add
C:\ffmpeg\binto your system PATH to make FFmpeg accessible from the command line. - Verify installation by running:
ffmpeg -version
These instructions should be followed inside the VPC25/ folder exactly as written. Do not modify the command examples, including the virtual environment name.
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython -m venv .venv
.venv\Scripts\Activate
pip install -r requirements.txtThis ensures all dependencies are installed inside an isolated environment.
Each time you start working on the project, you should activate the virtual environment:
source .venv/bin/activate.venv\Scripts\ActivateFor more details on virtual environments in Python, refer to:
- Modify
model.pyto implement your anonymization approach. - Store any necessary model parameters in the
parameters/directory. - Add any additional dependencies to
requirements.txt.
evaluation.pyrun.sh
- Place your test audio files inside
evaluation_data/. - The evaluation script will process these files automatically.
To test your model, execute:
bash run.shThis will:
- Set up and activate the virtual environment (if not already done).
- Ensure dependencies are installed.
- Process the source audio.
- Generate anonymized audio files.
- Output evaluation results to
results.csv.
Important:
- Windows users must use Git Bash to run this command, as PowerShell and Command Prompt do not support shell scripts properly.
- Windows and macOS users might need to run
run.shwith administrator privileges to avoid permission issues with symbolic links.
The evaluation script will measure:
- Equal Error Rate (EER): This metric, derived from an Automatic Speaker Verification (ASV) system, measures the system's ability to differentiate between speech from the same speaker and different speakers. A higher EER indicates better privacy protection, as it means the system is less likely to correctly identify the speaker.
- Word Error Rate (WER): This metric is calculated using an Automatic Speech Recognition (ASR) system and measures how well the anonymized speech preserves linguistic content. A lower WER indicates better utility, meaning the anonymized speech is still easily understood by the ASR system.
- Processing time: Measure the effeciency of the anonymization algorithm.
Results are stored in results.csv.
β You MUST:
- Implement your model in
model.py. - List dependencies in
requirements.txt. - Store model parameters in
parameters/. - Run evaluation using
run.sh.
β You MUST NOT:
- Delete or modify
evaluation.pyorrun.sh. - Remove or alter existing directories.
Good luck! ππ§