Skip to content

Questions About VITS Code Modifications and Model Performance #3

@TinaChen95

Description

@TinaChen95

Hi,
Thanks for your great works!
I'm curious to understand your thought process as a learner. May I ask why you decided to make modifications to the original VITS code?

  1. You mentioned 'robust,' but I'm not quite clear on its exact meaning. Does it refer to the model's performance in different aspects, such as WER (Word Error Rate) or talking speed?

  2. When you talk about 'speech quality,' are you referring to the sound quality of the generated speech? Is it similar to audio quality metrics like PSEQ?

  3. Regarding the 'expanding the receptive field of the Wavenet Flow module' modification, how did you analyze the need for this change, and in what ways does it enhance the quality of synthesized speech?

  4. I noticed that the original VITS was trained using PyTorch, but you chose to rewrite some code in TensorFlow. What motivated this decision? Are there specific advantages or requirements that led to this change in the tech stack?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions