Skip to content

Whisper Service Async Compute #31

@bennettrwu

Description

@bennettrwu

In the current version of whisper service, when the whisper model is run on an audio buffer, it blocks the main thread (even when computation is offloaded to the GPU/accelerator). When this takes too long, the web server fails to respond to websocket keepalive pings and the connection closes. This issue will get worse when whisper service begins to handle more concurrent streams of audio and the service spends more time running compute.

Therefore, whisper service should move to an asynchronous compute approach with computation occurring on a separate thread. Audio buffers that ready for inference should be put into a queue and a set of worker processes should pick them up and process them when they can. This way the main thread can be event driven, and without long running blocking compute tasks.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions