Skip to content

构造large margin data所用的scoring model是哪一个模型呀?有计划开源嘛? #56

@eyree

Description

@eyree

非常感谢贵团队的工作!请问3.1.2 Emotion and Speaking Style Editing构造large margin data所用的scoring model是哪一个模型呀?这个有计划开源嘛?

原文3.1.2 Emotion and Speaking Style Editing
Zero-shot Cloning. A triplet ⟨textprompt,audioneutral,audioemotion,style⟩is constructed for each emo-
tion and speaking style by selecting corresponding emotional and neutral audio clips from the same
speaker as the prompt audio and processing them with the StepTTS voice cloning interface, using a
text instruction that describes the target attribute.
Margin Scoring. To evaluate the triplet generated, we developed a scoring model using a small,
human-annotated dataset. The model evaluates audio pairs on a 1-10 scale, with higher margin
scores corresponding to more desirable outcomes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions