Skip to content

[EMNLP 2025 Findings] O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion

Notifications You must be signed in to change notification settings

huutuongtu/OOVC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[EMNLP 2025 Findings] O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion

arXiv Demo

🛠 Setup & Dependencies

Clone the repository

git clone https://github.com/huutuongtu/OOVC
cd OOVC

Install Python dependencies

conda create -n oovc python==3.10.12
conda activate oovc
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

🎧 Model Downloads

1️⃣ Download WavLM Model

Download WavLM-Large and place it under the wavlm/ directory.

2️⃣ Download Pretrained Generator Checkpoint

Download the pretrained checkpoint and place it under your logs folder (e.g., logs/oovc_w_f0/).

🚀 Inference

python convert.py \
  --source sample/8230-279154-0028.flac \
  --target sample/4970-29095-0008.flac \
  --checkpoint logs/oovc_w_f0/G_1470000.pth \
  --output sample/test_converted.wav

📂 Repository Structure

OOVC/
├── ...               
├── convert.py               # Main inference script
├── models_f0.py             # Generator model definition
├── mel_processing.py        # Mel spectrogram utilities
├── utils.py                 # Helper functions
├── wavlm/                   # WavLM model files
├── speaker_encoder/         # Speaker encoder files
├── configs/
│   └── freevc_f0.json       # Configuration file
├── sample/
│   ├── source_audio.flac
│   ├── target_audio.flac
│   └── test_converted.wav
└── logs/
    └── oovc_w_f0/
        └── G_1470000.pth


📘 Citation

If you use this code, please cite our paper:

@inproceedings{tu-2025_oovc,
  author    = {Huu Tuong Tu and Huan Vu and Cuong Tien Nguyen and Dien Hy Ngo and Nguyen Thi Thu Trang},
  title     = {O\_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion},
  booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2025)},
  year      = {2025},
}

🧠 Acknowledgements

This implementation builds upon FreeVC and VITS.

About

[EMNLP 2025 Findings] O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages