Finetuning Speech Encoders further

Hi,

I tried finetuning the Swahili speech encoder but the performance only increases to 9.6 BLEU from a base BLEU score of 7.5 on your already finetuned encoder. I finetuned the speech encoder for 5 epochs with augmented data. I am not willing to try more epochs as the performance increase is not I had imagined. I finetuned with about 30hrs of data. The MSE loss in the last epoch was 1.5*10^-6. Any different approach that might help achieve a better BLEU?

Also, what is the finetuned decoder model checkpoint that I read in the paper does well for Swahili? When I try to use it I get the error - ValueError: The input sequence length must be less than or equal to the maximum sequence length (512), but is 513 instead which I do not get for the normal decoder. All my audios are less than or equal to 30 sec.

Thank you for your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning Speech Encoders further #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Finetuning Speech Encoders further #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions