Skip to content

Speech2IPA Noise Robustness #12

@SanderGi

Description

@SanderGi

As pointed out in #11, transcription is very sensitive to background noise. We have run some preliminary experiments:

  1. Using webrtcvad to filter out non-speech segments: doesn't handle overlapping speech/sounds
  2. Training models on various kinds of augmented noisy speech: does not generalize well to unseen types of noise
Image

We need a low latency approach to remove noise that generalizes well to different types of noise. Some ideas that different PRs can explore:

  1. Enable noise suppression via Web API on supported devices/browsers
  2. Evaluate various open-source noise suppression models, see which we can run on-device in the browser and which would need to be hosted on a server with a GPU
  3. Look into more advanced noise suppression based on binaural audio and/or speaker specific embeddings
  4. Look into more noise robust Speech2IPA architectures, training objectives/regularization, and data augmentation approaches

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions