What happened?
When submitting a non-preemptible workload that requests multiple GPU devices using GPU memory (e.g., 2 devices × 6 GiB each), the scheduler allows the job to run even when doing so exceeds the queue's deservedGPUs non-preemptible quota.
Steps to reproduce:
- Create a queue with
deservedGPUs: 1 (no hard limit)
- Submit a non-preemptible workload requesting 2 GPU devices with 60% of the node GPU memory each (total = 1.2 GPU-fraction units)
- Observe the workload is scheduled, consuming 1.2 GPU-fraction units of non-preemptible quota against a queue deserving only 1.0
What did you expect to happen?
A non-preemptible workload whose total GPU consumption across all requested devices exceeds the queue's deservedGPUs quota should remain Pending, consistent with the behavior of other non-preemptible workloads that exceed quota.
Environment
- Kubernetes version: v1.34
- KAI Scheduler version: v0.14.0
- Tools: GPU sharing / fractional GPU feature must be enabled
What happened?
When submitting a non-preemptible workload that requests multiple GPU devices using GPU memory (e.g., 2 devices × 6 GiB each), the scheduler allows the job to run even when doing so exceeds the queue's
deservedGPUsnon-preemptible quota.Steps to reproduce:
deservedGPUs: 1(no hard limit)What did you expect to happen?
A non-preemptible workload whose total GPU consumption across all requested devices exceeds the queue's
deservedGPUsquota should remainPending, consistent with the behavior of other non-preemptible workloads that exceed quota.Environment