Skip to content

Conversation

@Jeronymous
Copy link
Member

This add TTS feature (function text -> audio waveform)


text_tokens = tokenizer(text, return_tensors="pt").input_ids.to(device)
speaker_type_prompt_tokens = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
audio_tensor = model.generate(input_ids=speaker_type_prompt_tokens, prompt_input_ids=text_tokens)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hedhoud "speaker_type_prompt_tokens" represents here the type of speaker/speech (e.g. "A female speaker delivers an expressive and animated speech with a very high-pitch voice [...]" : see above).
But it turns out that it's not working. The gender is not respected. Do you see what can be wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a problem with the "description tokenizer" that is in fact different from the other tokenizer (which tokenizes the text to pronounce).
Fixed by 060a85d

@linagora-labs linagora-labs deleted a comment from hedhoud Apr 1, 2025
@Jeronymous Jeronymous merged commit df18fae into main Apr 3, 2025
1 check passed
@Jeronymous Jeronymous deleted the feature/tts branch April 3, 2025 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants