-
Notifications
You must be signed in to change notification settings - Fork 3
Description
As the user is speaking a dialogue phrase, we highlight the current word they are on to make the UI feel responsive and to detect when they are done speaking. This has to be very low-latency and light on computational resources. Currently, we use the Web Speech API. However, it is only supported on Chrome and Safari.
We need improved browser support, through some fallback either to a small local model or to one running remotely (either on the server or a cloud solution like Azure/GCP). A small local model might be ideal for low latency, especially if we take advantage of the fact that we know which words the user is trying to say. Hence we don't need a full transcription model, just one that can detect when a given word has been said.
This task will involve some research and evaluation of the best approach.