ReVoice-2025 — Speech Enhancement Hackathon (Baseline)

EN | RU

This repository represents a baseline (basic solution) for participating in the ReVoice-2025 hackathon. The project is based on the Miipher model and adapted for the competition. We tried to make the code as clean, fast, and convenient as possible.

🚀 Quick Start

1. Environment Setup

Python 3.10.11 is recommended.

python3 -m venv venv
source venv/bin/activate

pip install -r requirements.txt
pip install --no-dependencies git+https://github.com/Wataru-Nakata/ssl-vocoders.git

export PYTHONPATH=./src

2. Downloading Pre-trained Weights

The script will automatically download Miipher and HiFiGAN weights to the ./models folder.

python3 scripts/download_weights.py

3. Dataset Preparation

Training the model requires a prepared dataset (clean + noisy audio + phonemes). The script takes your folder with clean audio, adds noise (using the degrader config), and generates phonemes (using GigaAM for transcription if no text is present).

Important: Before running, edit examples/configs/degrader_config.yaml, specifying the path to your noise files (noise_dir parameter etc., if used).

python3 scripts/prepare_dataset.py \
  --input_dir /path/to/clean_audio \
  --output_dir /path/to/processed_dataset \
  --degrader_config examples/configs/degrader_config.yaml

4. Training Configuration

All training settings are located in examples/configs/config.yaml. Main parameters to check:

data.train_dataset_path: Path to the folder you created in step 3.
data.val_dataset_path: Path to the validation set.
train.trainer.devices: Number and IDs of GPUs (default 1).

5. Starting Training

python3 examples/train.py

6. Monitoring (TensorBoard)

Monitor training progress and metrics:

tensorboard --logdir logs/

7. Inference (Speech Restoration)

To restore speech from noisy files, use the run_miipher.py script. It takes a folder with input files and a folder to save the result.

python3 scripts/run_miipher.py \
  --input_dir /path/to/noisy_audio \
  --output_dir /path/to/restored_audio \
  --lang_code rus \
  --miipher_ckpt ./models/miipher.ckpt \
  --vocoder_ckpt ./models/hifigan.ckpt

Arguments:

--input_dir: Folder with noisy files (.wav, .mp3, .flac).
--output_dir: Folder where restored files will be saved.
--lang_code: Language code for phonetization (default rus). If text transcripts (.txt) exist, the script will try to find them. Otherwise, ASR (GigaAM) will be used.

8. Quality Evaluation (Metrics)

To calculate metrics (SI-SNR, STOI, MelLoss), use eval.py. The script compares the folder with restored files (hypotheses) and the folder with clean reference files (references).

python3 eval.py \
  --hyp_dir /path/to/restored_audio \
  --ref_dir /path/to/clean_reference_audio \
  --output_csv metrics_results.csv

Arguments:

--hyp_dir: Folder with your restored files.
--ref_dir: Folder with clean original files (files must have matching names).
--output_csv: Path to save the results table (default metrics_results.csv).

📂 Project Structure

examples/train.py — Main script for starting training.
examples/configs/config.yaml — Configuration for hyperparameters, paths, and the model.
run_miipher.py — Script for running inference on a folder.
eval.py — Script for calculating metrics on a folder.
scripts/prepare_dataset.py — Script for dataset generation (augmentation + phonemization).
scripts/download_weights.py — Weight downloader.
src/miipher/lightning_module.py — Training logic (Pytorch Lightning), training step, validation, metrics.
src/miipher/dataset — Data loading logic (Dataset, DataModule).
src/miipher/metrics/eval_metrics.py — Implementation of SI-SNR, STOI, MelLoss metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReVoice-2025 — Speech Enhancement Hackathon (Baseline)

🚀 Quick Start

1. Environment Setup

2. Downloading Pre-trained Weights

3. Dataset Preparation

4. Training Configuration

5. Starting Training

6. Monitoring (TensorBoard)

7. Inference (Speech Restoration)

8. Quality Evaluation (Metrics)

📂 Project Structure

FilesExpand file tree

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

ReVoice-2025 — Speech Enhancement Hackathon (Baseline)

🚀 Quick Start

1. Environment Setup

2. Downloading Pre-trained Weights

3. Dataset Preparation

4. Training Configuration

5. Starting Training

6. Monitoring (TensorBoard)

7. Inference (Speech Restoration)

8. Quality Evaluation (Metrics)

📂 Project Structure