Skip to content

Conversation

@Yeongtae
Copy link

@Yeongtae Yeongtae commented Nov 18, 2019

Summary
You can see monitoring metrics for the Seq2Seq TTS model on the Tensorboard.

Monitoring metric

  • Attention Alignment Diagonality(AAD):
    AAD is defined as the length of attention alignment path divided by the length of the diagonal path.
    The attention alignment path is the line connecting maximum values for each time-step in the attention weight matrix.
    The meaning of this metric is the degree of learning of the relationship between the encoder and the decoder.

  • Average max attention weight:
    The meaning of this metric is the degree of learning of the relationship between the encoder and the decoder.

  • log Mel Cepstrum Distortion (MCD)
    The acoustic similarity between the synthesized audio and the target audio.

  • f0 RMSE
    The similarity of fundamental frequency between the synthesized audio and the target audio.

Future work

  • The calculation method for f0 will be changed.

Example
Train_metrics2
Train_metrics1
Validation_metrics1
Validation_metrics2
)

Yeongtae and others added 8 commits August 18, 2019 23:42
ㄴ attention alignment diagonality
ㄴ average max attention weight
ㄴ f0 RMSE
ㄴ MCD
ㄴ re-implementing and solving errors.
ㄴ Audio processing: trimming silence(if it < 23 db), preemphasis, amplitude normalization
ㄴ Remove short clip(if it < 14847 samples, It maybe percentile 0.10 my own dataset)
ㄴ Replacing MCD(metric name) to log_MCD
@Yeongtae Yeongtae mentioned this pull request Dec 17, 2019
@Yeongtae Yeongtae changed the title Monitoring metric Monitering metric Dec 23, 2019
@Yeongtae Yeongtae changed the title Monitering metric Monitoring metric Dec 23, 2019
ㄴ debugging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants