This repository represents a baseline (basic solution) for participating in the ReVoice-2025 hackathon. The project is based on the Miipher model and adapted for the competition. We tried to make the code as clean, fast, and convenient as possible.
Python 3.10.11 is recommended.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install --no-dependencies git+https://github.com/Wataru-Nakata/ssl-vocoders.git
export PYTHONPATH=./src The script will automatically download Miipher and HiFiGAN weights to the ./models folder.
python3 scripts/download_weights.pyTraining the model requires a prepared dataset (clean + noisy audio + phonemes). The script takes your folder with clean audio, adds noise (using the degrader config), and generates phonemes (using GigaAM for transcription if no text is present).
Important: Before running, edit examples/configs/degrader_config.yaml, specifying the path to your noise files (noise_dir parameter etc., if used).
python3 scripts/prepare_dataset.py \
--input_dir /path/to/clean_audio \
--output_dir /path/to/processed_dataset \
--degrader_config examples/configs/degrader_config.yamlAll training settings are located in examples/configs/config.yaml.
Main parameters to check:
data.train_dataset_path: Path to the folder you created in step 3.data.val_dataset_path: Path to the validation set.train.trainer.devices: Number and IDs of GPUs (default1).
python3 examples/train.pyMonitor training progress and metrics:
tensorboard --logdir logs/To restore speech from noisy files, use the run_miipher.py script. It takes a folder with input files and a folder to save the result.
python3 scripts/run_miipher.py \
--input_dir /path/to/noisy_audio \
--output_dir /path/to/restored_audio \
--lang_code rus \
--miipher_ckpt ./models/miipher.ckpt \
--vocoder_ckpt ./models/hifigan.ckptArguments:
--input_dir: Folder with noisy files (.wav,.mp3,.flac).--output_dir: Folder where restored files will be saved.--lang_code: Language code for phonetization (defaultrus). If text transcripts (.txt) exist, the script will try to find them. Otherwise, ASR (GigaAM) will be used.
To calculate metrics (SI-SNR, STOI, MelLoss), use eval.py. The script compares the folder with restored files (hypotheses) and the folder with clean reference files (references).
python3 eval.py \
--hyp_dir /path/to/restored_audio \
--ref_dir /path/to/clean_reference_audio \
--output_csv metrics_results.csvArguments:
--hyp_dir: Folder with your restored files.--ref_dir: Folder with clean original files (files must have matching names).--output_csv: Path to save the results table (defaultmetrics_results.csv).
examples/train.py— Main script for starting training.examples/configs/config.yaml— Configuration for hyperparameters, paths, and the model.run_miipher.py— Script for running inference on a folder.eval.py— Script for calculating metrics on a folder.scripts/prepare_dataset.py— Script for dataset generation (augmentation + phonemization).scripts/download_weights.py— Weight downloader.src/miipher/lightning_module.py— Training logic (Pytorch Lightning), training step, validation, metrics.src/miipher/dataset— Data loading logic (Dataset, DataModule).src/miipher/metrics/eval_metrics.py— Implementation of SI-SNR, STOI, MelLoss metrics.