Whisper Service Async Compute

In the current version of whisper service, when the whisper model is run on an audio buffer, it blocks the main thread (even when computation is offloaded to the GPU/accelerator). When this takes too long, the web server fails to respond to websocket keepalive pings and the connection closes. This issue will get worse when whisper service begins to handle more concurrent streams of audio and the service spends more time running compute.

Therefore, whisper service should move to an asynchronous compute approach with computation occurring on a separate thread. Audio buffers that ready for inference should be put into a queue and a set of worker processes should pick them up and process them when they can. This way the main thread can be event driven, and without long running blocking compute tasks. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Whisper Service Async Compute #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Whisper Service Async Compute #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions