When multiple GPUs are connected to a host, our algorithm selects a list of devices to run a network that requires multiple GPUs. These devices/gpus are then used for inference.
The problem arises when two processes run simultaneously (either directly on the host or inside separate containers) with visibility to all GPUs. Currently, there is no mechanism for one process to know that certain GPUs have already been picked and reserved by another process.
This leads to a race condition:
Process 1 selects a set of GPUs and begins inference.
Process 2, unaware of Process 1’s allocation, may also select overlapping GPUs.
Both processes attempt to use the same devices, causing conflicts, degraded performance, or failures.
Question: If we expose visibility of all GPUs to more than one container, what mechanisms exist to prevent race conditions in GPU allocation?