-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Description
As pointed out in #11, transcription is very sensitive to background noise. We have run some preliminary experiments:
- Using webrtcvad to filter out non-speech segments: doesn't handle overlapping speech/sounds
- Training models on various kinds of augmented noisy speech: does not generalize well to unseen types of noise
We need a low latency approach to remove noise that generalizes well to different types of noise. Some ideas that different PRs can explore:
- Enable noise suppression via Web API on supported devices/browsers
- Evaluate various open-source noise suppression models, see which we can run on-device in the browser and which would need to be hosted on a server with a GPU
- Look into more advanced noise suppression based on binaural audio and/or speaker specific embeddings
- Look into more noise robust Speech2IPA architectures, training objectives/regularization, and data augmentation approaches
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed