-
Notifications
You must be signed in to change notification settings - Fork 0
Description
In the current version of whisper service, when the whisper model is run on an audio buffer, it blocks the main thread (even when computation is offloaded to the GPU/accelerator). When this takes too long, the web server fails to respond to websocket keepalive pings and the connection closes. This issue will get worse when whisper service begins to handle more concurrent streams of audio and the service spends more time running compute.
Therefore, whisper service should move to an asynchronous compute approach with computation occurring on a separate thread. Audio buffers that ready for inference should be put into a queue and a set of worker processes should pick them up and process them when they can. This way the main thread can be event driven, and without long running blocking compute tasks.