Skip to content

Improved Speech2IPA Streaming #3

@Reuben1987AI

Description

@Reuben1987AI

Hey guys, I tested the /stream endpoint with websockets and got pretty bad results compared to regular transcription, my guess is that the endpoint is breaking the audio in arbitrary pieces which breaks the context of the model for that word or phrase.

It seems to me that one way to make this works could be to detect silence either on the client or the server and slice the audio there, so each piece of audio is determined by the user silences.

Just wanted to know what you think, have a good day :)

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions