Skip to content

vbuylova/HiFi-GAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-to-Speech (TTS) with PyTorch

AboutInstallationHow To UseCreditsLicense

About

This repository contains an implemetation of HiFi-GAN with PyTorch.

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using conda or venv (+pyenv).

    a. conda version:

    # create env
    conda create -n project_env python=PYTHON_VERSION
    
    # activate env
    conda activate project_env

    b. venv (+pyenv) version:

    # create env
    ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env
    
    # alternatively, using default python version
    python3 -m venv project_env
    
    # activate env
    source project_env
  2. Install all required packages

    pip install -r requirements.txt
  3. Install pre-commit:

    pre-commit install

How To Use

To train a model, run the following command:

python3 train.py -cn=hifigan

How To Download

To download pretrained model, use:

gdown https://drive.google.com/uc?id=1n9DVZznWy49nKiSljAbqdQvNiZ_VcPOa

How To Evaluate

To synthesize an audio from audio, your dataset should follow this structure:

NameOfTheDirectoryWithUtterances
└── transcriptions
    ├── UtteranceID1.wav
    ├── UtteranceID2.wav
    .
    .
    .
    └── UtteranceIDn.wav

To get predictions, run

python3 synthesize.py -cn=from_audio '+datasets.test.audio_dir=<PATH-TO-DIR>' 'inferencer.from_pretrained=<PATH-TO-PRETRAINED-MODEL>'

To synthesize an audio from text, your dataset should follow this structure:

NameOfTheDirectoryWithUtterances
└── transcriptions
    ├── UtteranceID1.txt
    ├── UtteranceID2.txt
    .
    .
    .
    └── UtteranceIDn.txt

To get predictions, run

python3 synthesize.py -cn=from_text '+datasets.test.data_dir=<PATH-TO-DIR>' 'inferencer.from_pretrained=<PATH-TO-PRETRAINED-MODEL>'

If you want to pass text from cli, run:

python3 synthesize.py -cn=from_cli '+datasets.test.index=[{text: "<YOUR-TEXT>", path: "text.txt", audio_len: 0}]' 'inferencer.from_pretrained=<PATH-TO-PRETRAINED-MODEL>'

Report

Wandb report is available here.

Credits

This repository is based on a PyTorch Project Template.

License

License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages