-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Hey guys, I tested the /stream endpoint with websockets and got pretty bad results compared to regular transcription, my guess is that the endpoint is breaking the audio in arbitrary pieces which breaks the context of the model for that word or phrase.
It seems to me that one way to make this works could be to detect silence either on the client or the server and slice the audio there, so each piece of audio is determined by the user silences.
Just wanted to know what you think, have a good day :)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request