Skip to content

Vits based Mongolian (Khalkha) TTS language model

License

Notifications You must be signed in to change notification settings

btseee/oron-tts

OronTTS

Mongolian Cyrillic (Khalkha) multi-speaker Text-to-Speech system using VITS architecture.

Features

  • VITS Architecture: End-to-end TTS with variational inference and adversarial training
  • Multi-speaker: Support for distinct male and female voices
  • Mongolian Text Processing: Custom rule-based phonemizer for Cyrillic script
  • Number Normalization: Comprehensive Mongolian number-to-text transliteration
  • Audio Denoising: DeepFilterNet integration for preprocessing non-professional recordings
  • Hugging Face Integration: Dataset and model hub support

Installation

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -e ".[dev]"

Project Structure

oron-tts/
├── src/
│   ├── data/           # Dataset wrappers, denoising, preprocessing
│   ├── models/         # VITS architecture components
│   ├── training/       # Training loop, losses, checkpointing
│   └── utils/          # Audio processing, text normalization
├── scripts/
│   ├── prepare.py      # Dataset preparation
│   ├── train.py        # Model training
│   └── infer.py        # Inference/synthesis
└── configs/            # YAML configuration files

Usage

1. Dataset Preparation (Local)

Clean and denoise audio from Common Voice and MBSpeech datasets:

python scripts/prepare.py \
    --output-dir data/processed \
    --dataset all \
    --upload \
    --hf-repo btsee/oron-tts-dataset

2. Training (RunPod/Cloud)

For detailed RunPod setup instructions, see RUNPOD.md

Quick start on RunPod:

# Run setup script
wget https://raw.githubusercontent.com/btseee/oron-tts/main/runpod_setup.sh
chmod +x runpod_setup.sh
./runpod_setup.sh

# Start training
python scripts/train.py \
    --config configs/vits_runpod.yaml \
    --from-hf \
    --dataset btsee/mbspeech_mn \
    --push-to-hub \
    --hf-repo btsee/orontts

Local/Custom training:

# Single GPU
python scripts/train.py \
    --config configs/vits_runpod.yaml \
    --from-hf \
    --dataset btsee/oron-tts-dataset \
    --push-to-hub \
    --hf-repo btsee/oron-tts-model

# Multi-GPU
python scripts/train.py \
    --config configs/vits_runpod.yaml \
    --from-hf \
    --dataset btsee/oron-tts-dataset \
    --num-gpus 4

3. Inference

Generate speech from text:

python scripts/infer.py \
    --checkpoint checkpoints/vits_best.pt \
    --text "Сайн байна уу" \
    --speaker 0 \
    --output output.wav

Speaker IDs:

  • 0: Female voice
  • 1: Male voice

Mongolian Number Examples

Input Output
10 арван
25 хорин тав
100 зуун
1-р нэгдүгээр
2024 хоёр мянга хорин дөрөв

Configuration

Key hyperparameters in configs/vits_base.yaml:

sample_rate: 22050
batch_size: 16
learning_rate: 0.0002
model:
  hidden_channels: 192
  n_layers: 6
  n_heads: 2

Logging

OronTTS supports two logging modes:

Local Training (with tqdm):

use_tqdm: true  # Progress bars for interactive training
log_interval: 100  # Log every 100 steps

RunPod/Cloud Training (structured logs):

use_tqdm: false  # Disable tqdm for container logs
log_interval: 50  # More frequent logging

Container logs will show:

[2026-01-28 14:37:22] [INFO] Starting Epoch 1
[2026-01-28 14:37:24] [INFO] Step 0 | Batch 1/320 | Loss: 281.89 | Mel: 100.38 | KL: 175.99 | Dur: 0.01 | LR: 0.000200
[2026-01-28 14:37:29] [INFO] Step 10 | Batch 11/320 | Loss: 139.46 | Mel: 72.75 | KL: 63.69 | Dur: 0.01 | LR: 0.000200

Development

# Install dev dependencies
pip install -e ".[dev]"

# Lint
ruff check src/ scripts/

# Format
ruff format src/ scripts/
isort src/ scripts/

License

MIT

Citation

If you use OronTTS in your research, please cite:

@software{orontts2024,
  title = {OronTTS: Mongolian Text-to-Speech},
  year = {2024},
  url = {https://github.com/btsee/oron-tts}
}

About

Vits based Mongolian (Khalkha) TTS language model

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors 2

  •  
  •