Skip to content
/ zonos Public

Basic Zonos setup for seamless integration with multiple sentence inference tasks.

License

Notifications You must be signed in to change notification settings

lwdovico/zonos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zonos Basic Setup with Multiple Sentence Inference

This repository provides a basic setup for using Zonos, a text-to-speech model designed for generating audio in mutiple languages. This small framework allows for seamless text-to-speech conversion beyond the 30 seconds default limit.

Features

  • Multi-sentence processing: Converts multiple sentences of input text into audio.
  • Flexible input options: Accepts both text files and direct input strings.
  • Audio normalization: Automatically normalizes the generated audio for better sound quality.
  • Output formats: Supports generating audio in MP3 format.
  • Gradio Interface: Inference can be run both via .py or via the gradio interface (reccomended).

Requirements

  • Docker
  • A nvidia GPU with +6GB VRAM

Installation

  1. Clone the repository:
    git clone https://github.com/lwdovico/zonos
    cd zonos
  2. Then you can build and launch the gradio interface (reccomended):
    docker compose up
  3. Open the Gradio UI at: http://localhost:7861/

NB: The docker exposes the port 7861, if you need to change it please be sure to update it also in the gradio_main.py

WARNING

The network_mode: "host" configuration in docker-compose.yml works as expected on Linux systems but not on Windows. If you're running this Docker container on a Windows host, you should replace:

network_mode: "host"

with:

ports:
  - "7861:7861"

This will expose the Gradio app on port 7861 of your host machine, allowing you to access it via http://localhost:7861.

Optional: Build manually

You don't need to do it if you want to use the Gradio WebUI only

  1. Build the container:
   docker build -t zonos .

Optionally add your assets to the path before building the docker so to have them mounted

  1. Run and attach to it:
   docker run -it --gpus=all --net=host -v /path/to/zonos_outputs:/app/outputs -t zonos

Running manually the Inference

To generate audio from text, run the following script:

python main.py --input-text "Your text here."

You can also provide a path if the text is too long for the command line:

python main.py --input-text /path/to/text.txt

This will process the text and output a single audio file.

Customizing the Setup

NB: It doesn't allow for all the customizations available in the Gradio WebUI

You can customize the behavior of the script by adding the following command-line arguments:

  • --input-text: Specifies the text you want to convert into speech. Provide the text directly as a string or a path.
  • --speaker: Defines the speaker to use for the audio generation. By default, it uses an example audio file (assets/exampleaudio.mp3), but you can specify your own file or speaker model.
  • --language: Sets the language for the audio. The default is English (en-us), but you can change it to another supported language code.
  • --output-path: Determines where the generated audio will be saved. The default output file is output.mp3, but you can specify a custom path or filename (only absolute paths). Set the mounted output path /app/outputs/output.mp3 to have them in the /path/to/zonos_outputs
  • --seed: Sets a random seed for reproducibility of the results. The default is 42, but you can modify it if needed.

Example usage:

python main.py --input-text "Hello world!" --speaker "another_speaker.mp3" --language "fr-fr" --output-path "french_output.mp3" --seed 1234

Acknowledgements

Special thanks to the Zonos developers for creating this text-to-speech model.

About

Basic Zonos setup for seamless integration with multiple sentence inference tasks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published