OronTTS

Mongolian Cyrillic (Khalkha) multi-speaker Text-to-Speech system using VITS architecture.

Features

VITS Architecture: End-to-end TTS with variational inference and adversarial training
Multi-speaker: Support for distinct male and female voices
Mongolian Text Processing: Custom rule-based phonemizer for Cyrillic script
Number Normalization: Comprehensive Mongolian number-to-text transliteration
Audio Denoising: DeepFilterNet integration for preprocessing non-professional recordings
Hugging Face Integration: Dataset and model hub support

Installation

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -e ".[dev]"

Project Structure

oron-tts/
├── src/
│   ├── data/           # Dataset wrappers, denoising, preprocessing
│   ├── models/         # VITS architecture components
│   ├── training/       # Training loop, losses, checkpointing
│   └── utils/          # Audio processing, text normalization
├── scripts/
│   ├── prepare.py      # Dataset preparation
│   ├── train.py        # Model training
│   └── infer.py        # Inference/synthesis
└── configs/            # YAML configuration files

Usage

1. Dataset Preparation (Local)

Clean and denoise audio from Common Voice and MBSpeech datasets:

python scripts/prepare.py \
    --output-dir data/processed \
    --dataset all \
    --upload \
    --hf-repo btsee/oron-tts-dataset

2. Training (RunPod/Cloud)

For detailed RunPod setup instructions, see RUNPOD.md

Quick start on RunPod:

# Run setup script
wget https://raw.githubusercontent.com/btseee/oron-tts/main/runpod_setup.sh
chmod +x runpod_setup.sh
./runpod_setup.sh

# Start training
python scripts/train.py \
    --config configs/vits_runpod.yaml \
    --from-hf \
    --dataset btsee/mbspeech_mn \
    --push-to-hub \
    --hf-repo btsee/orontts

Local/Custom training:

# Single GPU
python scripts/train.py \
    --config configs/vits_runpod.yaml \
    --from-hf \
    --dataset btsee/oron-tts-dataset \
    --push-to-hub \
    --hf-repo btsee/oron-tts-model

# Multi-GPU
python scripts/train.py \
    --config configs/vits_runpod.yaml \
    --from-hf \
    --dataset btsee/oron-tts-dataset \
    --num-gpus 4

3. Inference

Generate speech from text:

python scripts/infer.py \
    --checkpoint checkpoints/vits_best.pt \
    --text "Сайн байна уу" \
    --speaker 0 \
    --output output.wav

Speaker IDs:

0: Female voice
1: Male voice

Mongolian Number Examples

Input	Output
10	арван
25	хорин тав
100	зуун
1-р	нэгдүгээр
2024	хоёр мянга хорин дөрөв

Configuration

Key hyperparameters in configs/vits_base.yaml:

sample_rate: 22050
batch_size: 16
learning_rate: 0.0002
model:
  hidden_channels: 192
  n_layers: 6
  n_heads: 2

Logging

OronTTS supports two logging modes:

Local Training (with tqdm):

use_tqdm: true  # Progress bars for interactive training
log_interval: 100  # Log every 100 steps

RunPod/Cloud Training (structured logs):

use_tqdm: false  # Disable tqdm for container logs
log_interval: 50  # More frequent logging

Container logs will show:

[2026-01-28 14:37:22] [INFO] Starting Epoch 1
[2026-01-28 14:37:24] [INFO] Step 0 | Batch 1/320 | Loss: 281.89 | Mel: 100.38 | KL: 175.99 | Dur: 0.01 | LR: 0.000200
[2026-01-28 14:37:29] [INFO] Step 10 | Batch 11/320 | Loss: 139.46 | Mel: 72.75 | KL: 63.69 | Dur: 0.01 | LR: 0.000200

Development

# Install dev dependencies
pip install -e ".[dev]"

# Lint
ruff check src/ scripts/

# Format
ruff format src/ scripts/
isort src/ scripts/

License

MIT

Citation

If you use OronTTS in your research, please cite:

@software{orontts2024,
  title = {OronTTS: Mongolian Text-to-Speech},
  year = {2024},
  url = {https://github.com/btsee/oron-tts}
}

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github		.github
.vscode		.vscode
configs		configs
scripts		scripts
src		src
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OronTTS

Features

Installation

Project Structure

Usage

1. Dataset Preparation (Local)

2. Training (RunPod/Cloud)

3. Inference

Mongolian Number Examples

Configuration

Logging

Development

License

Citation

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

btseee/oron-tts

Folders and files

Latest commit

History

Repository files navigation

OronTTS

Features

Installation

Project Structure

Usage

1. Dataset Preparation (Local)

2. Training (RunPod/Cloud)

3. Inference

Mongolian Number Examples

Configuration

Logging

Development

License

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages