-
Notifications
You must be signed in to change notification settings - Fork 74
Description
Problem Description
I'm experiencing an issue when trying to make the avatar correctly pronounce Vietnamese text using the speak() method. Specifically, I'm integrating the StreamingAvatar SDK with OpenAI Assistant to respond to users in Vietnamese (following the guide at https://docs.heygen.com/docs/integrate-with-opeanai-assistant), but I'm facing a dilemma between two incompatible modes:
Current Behavior
-
When initializing the avatar with
language: "vi":- Initial greeting: The avatar speaks in Vietnamese with proper intonation (works well)
-
When sending responses from OpenAI Assistant with
speak():- Using
TaskType.TALK: The avatar understands it's Vietnamese, but it CREATES NEW CONTENT instead of reading exactly what I provided - Using
TaskType.REPEAT: The avatar reads the EXACT content, but pronounces it as if an English speaker is reading Vietnamese text (incorrect intonation and pronunciation)
- Using
Root Cause
After analyzing the SDK source code, I noticed:
- The
languageparameter is only set increateStartAvatar()and sent to the/v1/streaming.newendpoint - When calling the
speak()method, theSpeakRequestinterface doesn't have alanguageparameter - The request to
/v1/streaming.taskdoesn't pass language information when usingTaskType.REPEAT
Steps to Reproduce
Reproduction steps:
-
Set up OpenAI Assistant and HeyGen integration according to the guide at https://docs.heygen.com/docs/integrate-with-opeanai-assistant
-
Initialize the avatar with Vietnamese language:
await avatar.createStartAvatar({
avatarName: "your-avatar",
language: "vi",
// (Other parameters)
});- Get a response from OpenAI in Vietnamese and send it to the avatar:
const openAIResponse = await getOpenAIResponse(userMessage); // Vietnamese response from OpenAI
await avatar.speak({
text: openAIResponse,
taskType: TaskType.REPEAT
});- Listen to the avatar's speech: The pronunciation will be accurate in terms of content, but with English intonation
Expected Behavior
The avatar should read the Vietnamese text accurately with natural Vietnamese intonation, especially when the language has been set to "vi" during initialization.
Environment
- SDK Version: 2.0.14
- Browser: Chrome 124.0.6367.60
- Tested with the eleven_multilingual_v2 model
Proposed Solutions
I suggest one of the following changes:
- Add a
languageparameter to theSpeakRequestinterface and pass it to the/v1/streaming.taskendpoint
export interface SpeakRequest {
text: string;
taskType?: TaskType;
taskMode?: TaskMode;
language?: string; // Add this parameter
}-
Or store and use the
languagevalue fromcreateStartAvatar()for all subsequentspeak()requests -
Or provide a new method that combines both features: reading exact content AND correct language pronunciation
This would be especially important when integrating with APIs like OpenAI Assistant, where we need the avatar to read AI responses accurately with the proper intonation of that language.
Thank you for considering this issue. I'm available to provide more information or assist in testing any solutions.