Zonos Basic Setup with Multiple Sentence Inference

This repository provides a basic setup for using Zonos, a text-to-speech model designed for generating audio in mutiple languages. This small framework allows for seamless text-to-speech conversion beyond the 30 seconds default limit.

Features

Multi-sentence processing: Converts multiple sentences of input text into audio.
Flexible input options: Accepts both text files and direct input strings.
Audio normalization: Automatically normalizes the generated audio for better sound quality.
Output formats: Supports generating audio in MP3 format.
Gradio Interface: Inference can be run both via .py or via the gradio interface (reccomended).

Requirements

Docker
A nvidia GPU with +6GB VRAM

Installation

Clone the repository:

git clone https://github.com/lwdovico/zonos
cd zonos

Then you can build and launch the gradio interface (reccomended):
```
docker compose up
```
Open the Gradio UI at: http://localhost:7861/

NB: The docker exposes the port 7861, if you need to change it please be sure to update it also in the gradio_main.py

WARNING

The network_mode: "host" configuration in docker-compose.yml works as expected on Linux systems but not on Windows. If you're running this Docker container on a Windows host, you should replace:

network_mode: "host"

with:

ports:
  - "7861:7861"

This will expose the Gradio app on port 7861 of your host machine, allowing you to access it via http://localhost:7861.

Optional: Build manually

You don't need to do it if you want to use the Gradio WebUI only

Build the container:

   docker build -t zonos .

Optionally add your assets to the path before building the docker so to have them mounted

Run and attach to it:

   docker run -it --gpus=all --net=host -v /path/to/zonos_outputs:/app/outputs -t zonos

Running manually the Inference

To generate audio from text, run the following script:

python main.py --input-text "Your text here."

You can also provide a path if the text is too long for the command line:

python main.py --input-text /path/to/text.txt

This will process the text and output a single audio file.

Customizing the Setup

NB: It doesn't allow for all the customizations available in the Gradio WebUI

You can customize the behavior of the script by adding the following command-line arguments:

--input-text: Specifies the text you want to convert into speech. Provide the text directly as a string or a path.
--speaker: Defines the speaker to use for the audio generation. By default, it uses an example audio file (assets/exampleaudio.mp3), but you can specify your own file or speaker model.
--language: Sets the language for the audio. The default is English (en-us), but you can change it to another supported language code.
--output-path: Determines where the generated audio will be saved. The default output file is output.mp3, but you can specify a custom path or filename (only absolute paths). Set the mounted output path /app/outputs/output.mp3 to have them in the /path/to/zonos_outputs
--seed: Sets a random seed for reproducibility of the results. The default is 42, but you can modify it if needed.

Example usage:

python main.py --input-text "Hello world!" --speaker "another_speaker.mp3" --language "fr-fr" --output-path "french_output.mp3" --seed 1234

Acknowledgements

Special thanks to the Zonos developers for creating this text-to-speech model.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
docker-compose.yml		docker-compose.yml
gradio_main.py		gradio_main.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zonos Basic Setup with Multiple Sentence Inference

Features

Requirements

Installation

WARNING

Optional: Build manually

Running manually the Inference

Customizing the Setup

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

lwdovico/zonos

Folders and files

Latest commit

History

Repository files navigation

Zonos Basic Setup with Multiple Sentence Inference

Features

Requirements

Installation

WARNING

Optional: Build manually

Running manually the Inference

Customizing the Setup

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages