diff --git a/content/en/docs/plugins/capacity.md b/content/en/docs/plugins/capacity.md new file mode 100644 index 00000000..ba801617 --- /dev/null +++ b/content/en/docs/plugins/capacity.md @@ -0,0 +1,180 @@ ++++ +title = "Capacity Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Capacity" +[menu.plugins] + weight = 2 ++++ + +### Capacity + +#### Overview + +The Capacity plugin manages queue resource allocation using a capacity-based model. It enforces queue capacity limits, guarantees minimum resource allocations, and supports hierarchical queue structures. The plugin calculates each queue's deserved resources based on its capacity, guarantee, and the cluster's total available resources. + +#### Features + +- **Queue Capacity Management**: Enforces queue capacity limits based on configured capability +- **Resource Guarantees**: Supports minimum resource guarantees for queues +- **Hierarchical Queues**: Supports hierarchical queue structures with parent-child relationships +- **Dynamic Resource Allocation**: Calculates deserved resources dynamically based on queue configuration +- **Resource Reclamation**: Supports resource reclamation from queues exceeding their capacity +- **Job Enqueue Control**: Validates resource availability before allowing jobs to be enqueued + +#### Configuration + +The Capacity plugin is configured through Queue resources. Here's an example: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: queue-capacity-example +spec: + weight: 1 + capability: + cpu: "100" + memory: "100Gi" + guarantee: + resource: + cpu: "20" + memory: "20Gi" + deserved: + cpu: "50" + memory: "50Gi" +``` + +##### Queue Configuration Fields + +- **capability**: Maximum resources the queue can consume +- **guarantee**: Minimum resources guaranteed to the queue +- **deserved**: Desired resource allocation for the queue (calculated automatically if not specified) +- **parent**: Parent queue name for hierarchical queue structures + +##### Hierarchical Queue Configuration + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: root-queue +spec: + weight: 1 + capability: + cpu: "1000" + memory: "1000Gi" +--- +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: child-queue +spec: + parent: root-queue + weight: 1 + capability: + cpu: "500" + memory: "500Gi" + guarantee: + resource: + cpu: "100" + memory: "100Gi" +``` + +#### How It Works + +1. **Capacity Calculation**: The plugin calculates each queue's real capacity by considering the total cluster resources, total guarantees, and the queue's own guarantee and capability. +2. **Deserved Resources**: Deserved resources are calculated based on the queue's real capacity and configured deserved values. +3. **Job Enqueue**: Before a job is enqueued, the plugin validates that the queue has sufficient capacity to accommodate the job's minimum resource requirements. +4. **Resource Allocation**: During scheduling, the plugin ensures that queues don't exceed their allocated capacity. +5. **Reclamation**: Queues that exceed their deserved resources can have tasks reclaimed to make room for other queues. + +#### Scenario + +The Capacity plugin is suitable for: + +- **Resource Quota Management**: Enforcing resource limits per queue or department +- **Multi-tenant Clusters**: Isolating resources between different tenants or teams +- **Resource Reservations**: Guaranteeing minimum resources for critical workloads +- **Hierarchical Organizations**: Organizations with nested resource allocation structures + +#### Examples + +##### Example 1: Basic Capacity Management + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: team-a +spec: + weight: 1 + capability: + cpu: "200" + memory: "200Gi" + nvidia.com/gpu: "8" + guarantee: + resource: + cpu: "50" + memory: "50Gi" + nvidia.com/gpu: "2" +``` + +##### Example 2: Hierarchical Capacity + +```yaml +# Root queue +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: root +spec: + weight: 1 + capability: + cpu: "1000" + memory: "1000Gi" + +--- +# Development queue +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: dev +spec: + parent: root + weight: 1 + capability: + cpu: "300" + memory: "300Gi" + +--- +# Production queue +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: prod +spec: + parent: root + weight: 1 + capability: + cpu: "500" + memory: "500Gi" + guarantee: + resource: + cpu: "200" + memory: "200Gi" +``` + +#### Notes + +- When hierarchical queues are enabled, only leaf queues can allocate tasks +- Queues without a capacity configuration are treated as best-effort queues +- The plugin automatically calculates real capacity considering parent queue constraints +- Resource guarantees cannot exceed queue capabilities diff --git a/content/en/docs/plugins/deviceshare.md b/content/en/docs/plugins/deviceshare.md new file mode 100644 index 00000000..b216fd94 --- /dev/null +++ b/content/en/docs/plugins/deviceshare.md @@ -0,0 +1,193 @@ ++++ +title = "Device Share Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Device Share" +[menu.plugins] + weight = 3 ++++ + +### Device Share + +#### Overview + +The Device Share plugin manages the sharing and allocation of device resources such as GPUs, NPUs, and other accelerators. It supports multiple device types including NVIDIA GPUs (both GPU sharing and vGPU), Ascend NPUs, and provides flexible scheduling policies for device allocation. The plugin enables efficient utilization of expensive accelerator resources through sharing capabilities. + +#### Features + +- **GPU Sharing**: Enable sharing of GPU resources among multiple pods +- **GPU Number**: Schedule based on the number of GPUs requested +- **vGPU Support**: Support for virtual GPU (vGPU) allocation +- **Ascend NPU Support**: Support for Ascend NPU devices including MindCluster VNPU and HAMi VNPU +- **Node Locking**: Optional node-level locking to prevent concurrent device allocations +- **Flexible Scheduling Policies**: Configurable scoring policies for device allocation +- **Batch Node Scoring**: Support for batch scoring of nodes for NPU devices + +#### Configuration + +The Device Share plugin can be configured with the following arguments: + +```yaml +actions: "allocate, backfill" +tiers: +- plugins: + - name: deviceshare + arguments: + deviceshare.GPUSharingEnable: true + deviceshare.GPUNumberEnable: false + deviceshare.VGPUEnable: false + deviceshare.NodeLockEnable: false + deviceshare.SchedulePolicy: "binpack" + deviceshare.ScheduleWeight: 10 + deviceshare.AscendMindClusterVNPUEnable: false + deviceshare.AscendHAMiVNPUEnable: false + deviceshare.KnownGeometriesCMName: "volcano-vgpu-device-config" + deviceshare.KnownGeometriesCMNamespace: "kube-system" +``` + +##### Configuration Parameters + +- **deviceshare.GPUSharingEnable** (bool): Enable GPU sharing mode +- **deviceshare.GPUNumberEnable** (bool): Enable GPU number-based scheduling (mutually exclusive with GPUSharingEnable) +- **deviceshare.VGPUEnable** (bool): Enable vGPU support (mutually exclusive with GPU sharing) +- **deviceshare.NodeLockEnable** (bool): Enable node-level locking for device allocation +- **deviceshare.SchedulePolicy** (string): Scheduling policy for device scoring (e.g., "binpack", "spread") +- **deviceshare.ScheduleWeight** (int): Weight for device scoring in node ordering +- **deviceshare.AscendMindClusterVNPUEnable** (bool): Enable Ascend MindCluster VNPU support +- **deviceshare.AscendHAMiVNPUEnable** (bool): Enable Ascend HAMi VNPU support +- **deviceshare.KnownGeometriesCMName** (string): ConfigMap name for vGPU geometries +- **deviceshare.KnownGeometriesCMNamespace** (string): Namespace for vGPU geometries ConfigMap + +#### Device Types + +##### NVIDIA GPU Sharing + +Enable GPU sharing to allow multiple pods to share a single GPU: + +```yaml +- name: deviceshare + arguments: + deviceshare.GPUSharingEnable: true + deviceshare.ScheduleWeight: 10 +``` + +Pods request GPU resources using: + +```yaml +resources: + requests: + nvidia.com/gpu: 2 # Request 2 GPU units (out of 100 per GPU) + limits: + nvidia.com/gpu: 2 +``` + +##### NVIDIA GPU Number + +Schedule based on the number of physical GPUs: + +```yaml +- name: deviceshare + arguments: + deviceshare.GPUNumberEnable: true + deviceshare.ScheduleWeight: 10 +``` + +Pods request whole GPUs: + +```yaml +resources: + requests: + nvidia.com/gpu: 1 # Request 1 whole GPU + limits: + nvidia.com/gpu: 1 +``` + +##### vGPU + +Enable virtual GPU support: + +```yaml +- name: deviceshare + arguments: + deviceshare.VGPUEnable: true + deviceshare.ScheduleWeight: 10 + deviceshare.KnownGeometriesCMName: "volcano-vgpu-device-config" + deviceshare.KnownGeometriesCMNamespace: "kube-system" +``` + +##### Ascend NPU + +Enable Ascend NPU support: + +```yaml +- name: deviceshare + arguments: + deviceshare.AscendMindClusterVNPUEnable: true + # or + deviceshare.AscendHAMiVNPUEnable: true + deviceshare.ScheduleWeight: 10 +``` + +#### Scenario + +The Device Share plugin is suitable for: + +- **GPU Clusters**: Clusters with NVIDIA GPU resources requiring efficient sharing +- **AI Training**: Machine learning training workloads requiring GPU acceleration +- **Multi-tenant GPU Sharing**: Environments where multiple users need access to GPU resources +- **NPU Workloads**: Workloads running on Ascend NPU devices +- **Cost Optimization**: Maximizing utilization of expensive accelerator hardware + +#### Examples + +##### Example 1: GPU Sharing for Small Workloads + +Configure GPU sharing for workloads that don't require full GPU resources: + +```yaml +- name: deviceshare + arguments: + deviceshare.GPUSharingEnable: true + deviceshare.SchedulePolicy: "binpack" + deviceshare.ScheduleWeight: 10 +``` + +##### Example 2: Whole GPU Allocation + +Configure for workloads requiring full GPU resources: + +```yaml +- name: deviceshare + arguments: + deviceshare.GPUNumberEnable: true + deviceshare.SchedulePolicy: "spread" + deviceshare.ScheduleWeight: 10 +``` + +##### Example 3: vGPU with Custom ConfigMap + +Configure vGPU with custom geometry configuration: + +```yaml +- name: deviceshare + arguments: + deviceshare.VGPUEnable: true + deviceshare.ScheduleWeight: 10 + deviceshare.KnownGeometriesCMName: "custom-vgpu-config" + deviceshare.KnownGeometriesCMNamespace: "gpu-system" +``` + +#### Notes + +- GPU sharing and GPU number modes are mutually exclusive +- GPU sharing and vGPU cannot be enabled simultaneously +- Node locking prevents race conditions in device allocation +- The plugin automatically registers supported devices based on configuration +- Batch scoring is used for NPU devices to optimize allocation decisions diff --git a/content/en/docs/plugins/extender.md b/content/en/docs/plugins/extender.md new file mode 100644 index 00000000..382f5637 --- /dev/null +++ b/content/en/docs/plugins/extender.md @@ -0,0 +1,256 @@ ++++ +title = "Extender Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Extender" +[menu.plugins] + weight = 4 ++++ + +### Extender + +#### Overview + +The Extender plugin enables Volcano scheduler to delegate scheduling decisions to external HTTP services. It allows users to extend Volcano's scheduling capabilities by implementing custom logic in external services. The plugin supports various scheduling hooks including predicate, prioritize, preemptable, reclaimable, and event handlers. + +#### Features + +- **External Service Integration**: Delegate scheduling decisions to external HTTP services +- **Multiple Scheduling Hooks**: Support for predicate, prioritize, preemptable, reclaimable, and other scheduling functions +- **Managed Resources**: Optionally filter tasks based on managed resources +- **Error Handling**: Configurable error handling with ignorable option +- **Event Handlers**: Support for allocate and deallocate event handlers +- **HTTP Timeout Configuration**: Configurable HTTP request timeout + +#### Configuration + +The Extender plugin can be configured with the following arguments: + +```yaml +actions: "reclaim, allocate, backfill, preempt" +tiers: +- plugins: + - name: extender + arguments: + extender.urlPrefix: http://127.0.0.1:8080 + extender.httpTimeout: 100ms + extender.onSessionOpenVerb: onSessionOpen + extender.onSessionCloseVerb: onSessionClose + extender.predicateVerb: predicate + extender.prioritizeVerb: prioritize + extender.preemptableVerb: preemptable + extender.reclaimableVerb: reclaimable + extender.queueOverusedVerb: queueOverused + extender.jobEnqueueableVerb: jobEnqueueable + extender.jobReadyVerb: jobReady + extender.allocateFuncVerb: allocateFunc + extender.deallocateFuncVerb: deallocateFunc + extender.ignorable: true + extender.managedResources: + - nvidia.com/gpu + - nvidia.com/gpumem +``` + +##### Configuration Parameters + +- **extender.urlPrefix** (string): Base URL prefix for the extender service +- **extender.httpTimeout** (string): HTTP request timeout (e.g., "100ms", "1s", "1m") +- **extender.onSessionOpenVerb** (string): Verb for OnSessionOpen method +- **extender.onSessionCloseVerb** (string): Verb for OnSessionClose method +- **extender.predicateVerb** (string): Verb for Predicate method +- **extender.prioritizeVerb** (string): Verb for Prioritize method +- **extender.preemptableVerb** (string): Verb for Preemptable method +- **extender.reclaimableVerb** (string): Verb for Reclaimable method +- **extender.queueOverusedVerb** (string): Verb for QueueOverused method +- **extender.jobEnqueueableVerb** (string): Verb for JobEnqueueable method +- **extender.jobReadyVerb** (string): Verb for JobReady method +- **extender.allocateFuncVerb** (string): Verb for AllocateFunc event handler +- **extender.deallocateFuncVerb** (string): Verb for DeallocateFunc event handler +- **extender.ignorable** (bool): Whether the extender can ignore unexpected errors +- **extender.managedResources** (list): List of resources managed by the extender (comma-separated or list format) + +#### How It Works + +1. **Session Lifecycle**: The extender can hook into session open and close events to initialize and cleanup resources. +2. **Predicate**: The extender can filter nodes based on custom criteria during the predicate phase. +3. **Prioritize**: The extender can score nodes based on custom logic during the prioritize phase. +4. **Preemptable/Reclaimable**: The extender can determine which tasks can be preempted or reclaimed. +5. **Queue Management**: The extender can participate in queue overused and job enqueueable decisions. +6. **Event Handlers**: The extender can receive notifications when tasks are allocated or deallocated. + +#### Managed Resources + +The extender can optionally manage specific resources. When managed resources are configured, the extender is only invoked for tasks that request at least one of the managed resources: + +```yaml +extender.managedResources: +- nvidia.com/gpu +- nvidia.com/gpumem +``` + +If no managed resources are specified, the extender is invoked for all tasks. + +#### Error Handling + +The extender can be configured to ignore errors: + +```yaml +extender.ignorable: true +``` + +When ignorable is set to true, errors from the extender are logged but don't prevent scheduling from continuing. When set to false, errors cause scheduling decisions to fail. + +#### API Contract + +The extender service must implement HTTP POST endpoints for each configured verb. The request body contains JSON-encoded scheduling information, and the response should contain the appropriate scheduling decision. + +##### Example Predicate Request/Response + +**Request:** +```json +{ + "task": { + "namespace": "default", + "name": "task-1", + "resources": { + "cpu": 2, + "memory": 4096 + } + }, + "node": { + "name": "node-1", + "allocatable": { + "cpu": 8, + "memory": 16384 + } + } +} +``` + +**Response:** +```json +{ + "code": 0, + "errorMessage": "" +} +``` + +##### Example Prioritize Request/Response + +**Request:** +```json +{ + "task": { + "namespace": "default", + "name": "task-1", + "resources": { + "cpu": 2, + "memory": 4096 + } + }, + "nodes": [ + { + "name": "node-1", + "allocatable": { + "cpu": 8, + "memory": 16384 + } + }, + { + "name": "node-2", + "allocatable": { + "cpu": 8, + "memory": 16384 + } + } + ] +} +``` + +**Response:** +```json +{ + "nodeScore": { + "node-1": 80.5, + "node-2": 75.2 + }, + "errorMessage": "" +} +``` + +#### Scenario + +The Extender plugin is suitable for: + +- **Custom Scheduling Logic**: Implementing domain-specific scheduling requirements +- **Third-party Integration**: Integrating with external resource management systems +- **Advanced Filtering**: Complex node filtering based on external data sources +- **Custom Scoring**: Custom node scoring algorithms not available in standard plugins +- **Resource-specific Logic**: Handling special resources with custom allocation logic + +#### Examples + +##### Example 1: GPU Extender + +Configure an extender for GPU-specific scheduling: + +```yaml +- name: extender + arguments: + extender.urlPrefix: http://gpu-scheduler:8080 + extender.httpTimeout: 1s + extender.predicateVerb: predicate + extender.prioritizeVerb: prioritize + extender.managedResources: + - nvidia.com/gpu + - nvidia.com/gpumem + extender.ignorable: false +``` + +##### Example 2: Custom Node Filtering + +Configure an extender for custom node filtering: + +```yaml +- name: extender + arguments: + extender.urlPrefix: http://custom-filter:8080 + extender.httpTimeout: 500ms + extender.predicateVerb: customFilter + extender.ignorable: true +``` + +##### Example 3: Full Lifecycle Hooks + +Configure an extender with all lifecycle hooks: + +```yaml +- name: extender + arguments: + extender.urlPrefix: http://full-extender:8080 + extender.httpTimeout: 2s + extender.onSessionOpenVerb: onSessionOpen + extender.onSessionCloseVerb: onSessionClose + extender.predicateVerb: predicate + extender.prioritizeVerb: prioritize + extender.preemptableVerb: preemptable + extender.reclaimableVerb: reclaimable + extender.allocateFuncVerb: allocateFunc + extender.deallocateFuncVerb: deallocateFunc + extender.ignorable: true +``` + +#### Notes + +- The extender service must be accessible from the Volcano scheduler +- HTTP requests use POST method with JSON content type +- Maximum response body size is 10MB +- The extender should return HTTP 200 status code for successful operations +- Error responses should include appropriate error messages in the response body diff --git a/content/en/docs/plugins/nodegroup.md b/content/en/docs/plugins/nodegroup.md new file mode 100644 index 00000000..b3eba1d4 --- /dev/null +++ b/content/en/docs/plugins/nodegroup.md @@ -0,0 +1,240 @@ ++++ +title = "Node Group Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Node Group" +[menu.plugins] + weight = 5 ++++ + +### Node Group + +#### Overview + +The Node Group plugin provides queue-level node group affinity and anti-affinity capabilities. It allows queues to specify which node groups their jobs should run on, enabling better resource isolation and workload distribution. The plugin supports both required and preferred node group affinity/anti-affinity rules, and can inherit affinity rules from parent queues in hierarchical queue structures. + +#### Features + +- **Queue-level Affinity**: Define node group affinity rules at the queue level +- **Required and Preferred Rules**: Support for both required (hard) and preferred (soft) affinity constraints +- **Anti-affinity Support**: Support for both affinity and anti-affinity rules +- **Hierarchical Inheritance**: Inherit affinity rules from parent queues when hierarchical queues are enabled +- **Node Group Labeling**: Uses node labels to identify node groups +- **Strict Mode**: Configurable strict mode for affinity enforcement + +#### Configuration + +The Node Group plugin can be configured with the following arguments: + +```yaml +actions: "reclaim, allocate, backfill, preempt" +tiers: +- plugins: + - name: nodegroup + arguments: + strict: true +``` + +##### Configuration Parameters + +- **strict** (bool): Enable strict mode. In strict mode, nodes without node group labels are rejected if the queue has affinity rules, and nodes with node group labels are rejected if the queue has no affinity rules. Default is `true`. + +#### Node Group Labeling + +Nodes must be labeled with the node group name using the `volcano.sh/nodegroup-name` label: + +```yaml +apiVersion: v1 +kind: Node +metadata: + name: node-1 + labels: + volcano.sh/nodegroup-name: "group-a" +spec: + # node spec +``` + +#### Queue Configuration + +Queues can specify node group affinity rules in their spec: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: queue-example +spec: + weight: 1 + affinity: + nodeGroupAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "group-a" + - "group-b" + preferredDuringSchedulingIgnoredDuringExecution: + - "group-c" + nodeGroupAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "group-d" + preferredDuringSchedulingIgnoredDuringExecution: + - "group-e" +``` + +##### Queue Affinity Fields + +- **nodeGroupAffinity.requiredDuringSchedulingIgnoredDuringExecution**: Required node groups. Tasks must be scheduled on nodes in one of these groups. +- **nodeGroupAffinity.preferredDuringSchedulingIgnoredDuringExecution**: Preferred node groups. Tasks prefer to be scheduled on nodes in these groups. +- **nodeGroupAntiAffinity.requiredDuringSchedulingIgnoredDuringExecution**: Required anti-affinity groups. Tasks must not be scheduled on nodes in these groups. +- **nodeGroupAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution**: Preferred anti-affinity groups. Tasks prefer not to be scheduled on nodes in these groups. + +#### Hierarchical Queue Support + +When hierarchical queues are enabled, queues without explicit affinity rules inherit affinity rules from their nearest ancestor queue that has affinity rules defined: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: parent-queue +spec: + weight: 1 + affinity: + nodeGroupAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "group-a" +--- +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: child-queue +spec: + parent: parent-queue + weight: 1 + # Child queue inherits affinity rules from parent-queue +``` + +#### Scoring + +The plugin provides node scoring based on affinity rules: + +- **Required Affinity**: +100 points +- **Preferred Affinity**: +50 points +- **Preferred Anti-affinity**: -1 points + +#### Scenario + +The Node Group plugin is suitable for: + +- **Resource Isolation**: Isolating workloads to specific node groups for security or compliance reasons +- **Workload Distribution**: Distributing workloads across different node groups +- **Hardware-specific Scheduling**: Scheduling workloads on nodes with specific hardware characteristics +- **Multi-tenant Isolation**: Ensuring tenant workloads run on designated node groups +- **Geographic Distribution**: Scheduling workloads based on geographic location of node groups + +#### Examples + +##### Example 1: Required Affinity + +Configure a queue to require nodes from specific groups: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: gpu-queue +spec: + weight: 1 + affinity: + nodeGroupAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "gpu-group-1" + - "gpu-group-2" +``` + +##### Example 2: Preferred Affinity + +Configure a queue to prefer nodes from specific groups: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: cpu-queue +spec: + weight: 1 + affinity: + nodeGroupAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - "cpu-group-1" + - "cpu-group-2" +``` + +##### Example 3: Anti-affinity + +Configure a queue to avoid certain node groups: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: production-queue +spec: + weight: 1 + affinity: + nodeGroupAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "development-group" + preferredDuringSchedulingIgnoredDuringExecution: + - "test-group" +``` + +##### Example 4: Combined Affinity and Anti-affinity + +Configure a queue with both affinity and anti-affinity rules: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: mixed-queue +spec: + weight: 1 + affinity: + nodeGroupAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "group-a" + preferredDuringSchedulingIgnoredDuringExecution: + - "group-b" + nodeGroupAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "group-c" + preferredDuringSchedulingIgnoredDuringExecution: + - "group-d" +``` + +##### Example 5: Non-strict Mode + +Configure the plugin in non-strict mode: + +```yaml +- name: nodegroup + arguments: + strict: false +``` + +In non-strict mode, nodes without node group labels are allowed if the queue has no affinity rules. + +#### Notes + +- Nodes must be labeled with `volcano.sh/nodegroup-name` to participate in node group scheduling +- Required affinity rules are hard constraints and must be satisfied for scheduling +- Preferred affinity rules are soft constraints and affect scoring +- Anti-affinity rules take precedence over affinity rules +- In hierarchical queues, child queues inherit affinity rules from their nearest ancestor with affinity rules +- When hierarchical queues are enabled, set `enableHierarchy: true` in the plugin configuration diff --git a/content/en/docs/plugins/resource-strategy-fit.md b/content/en/docs/plugins/resource-strategy-fit.md new file mode 100644 index 00000000..bc2f51e4 --- /dev/null +++ b/content/en/docs/plugins/resource-strategy-fit.md @@ -0,0 +1,174 @@ ++++ +title = "Resource Strategy Fit Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Resource Strategy Fit" +[menu.plugins] + weight = 1 ++++ + +### Resource Strategy Fit + +#### Overview + +The Resource Strategy Fit plugin provides flexible resource allocation strategies for scheduling tasks onto nodes. It supports multiple scoring strategies including MostAllocated and LeastAllocated for different resource types, enabling administrators to configure custom resource allocation policies. The plugin also supports additional features like SRA (Smart Resource Allocation) and Proportional resource allocation. + +#### Features + +- **Flexible Resource Scoring**: Supports `MostAllocated` and `LeastAllocated` scoring strategies for different resource types +- **Customizable Weights**: Configure weights for each resource type to control their impact on scoring +- **Pod-level Scoring**: Supports pod-level scoring strategy configuration through annotations +- **Wildcard Pattern Matching**: Supports wildcard patterns for resource matching (e.g., `nvidia.com/gpu/*`) +- **SRA Support**: Optional Smart Resource Allocation (SRA) for enhanced resource allocation +- **Proportional Allocation**: Optional proportional resource allocation policy + +#### Configuration + +The Resource Strategy Fit plugin can be configured with the following arguments: + +```yaml +actions: "enqueue, allocate, backfill, reclaim, preempt" +tiers: +- plugins: + - name: resource-strategy-fit + arguments: + resourceStrategyFitWeight: 10 + resources: + nvidia.com/gpu: + type: MostAllocated + weight: 2 + cpu: + type: LeastAllocated + weight: 1 + memory: + type: LeastAllocated + weight: 1 + sra: + enable: true + resources: nvidia.com/gpu + weight: 10 + resourceWeight: + nvidia.com/gpu: 1 + proportional: + enable: false + resources: nvidia.com/gpu + resourceProportion: + nvidia.com/gpu.cpu: 4 + nvidia.com/gpu.memory: 8 +``` + +##### Configuration Parameters + +- **resourceStrategyFitWeight** (int): Global weight for the resource strategy fit plugin. Default is 10. +- **resources** (map): Resource-specific configuration with the following fields: + - **type**: Scoring strategy type (`MostAllocated` or `LeastAllocated`) + - **weight**: Weight for this resource in scoring calculation +- **sra** (optional): SRA configuration: + - **enable**: Enable/disable SRA + - **resources**: Comma-separated list of resources for SRA + - **weight**: Weight for SRA scoring + - **resourceWeight**: Per-resource weights for SRA +- **proportional** (optional): Proportional allocation configuration: + - **enable**: Enable/disable proportional allocation + - **resources**: Comma-separated list of resources + - **resourceProportion**: Proportional ratios for resource combinations + +##### Scoring Strategies + +- **MostAllocated**: Prefers nodes with higher resource utilization. Useful for binpacking scenarios where you want to fill nodes before using new ones. +- **LeastAllocated**: Prefers nodes with lower resource utilization. Useful for spreading workloads across nodes to improve availability. + +##### Pod-level Configuration + +Pods can specify their own scoring strategy using annotations: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + annotations: + volcano.sh/resource-strategy-scoring-type: MostAllocated + volcano.sh/resource-strategy-weight: '{"nvidia.com/gpu": 2, "cpu": 1}' +spec: + containers: + - name: container + resources: + requests: + nvidia.com/gpu: 1 + cpu: "2" +``` + +#### Scenario + +The Resource Strategy Fit plugin is suitable for: + +- **Mixed Workloads**: Clusters with diverse workload types requiring different resource allocation strategies +- **GPU Clusters**: GPU-intensive workloads where GPUs should be allocated using MostAllocated strategy +- **High Availability**: Workloads requiring distribution across nodes using LeastAllocated strategy +- **Custom Allocation Policies**: Organizations with specific resource allocation requirements + +#### Examples + +##### Example 1: GPU Binpacking + +Configure the plugin to use MostAllocated for GPUs to pack GPU workloads on fewer nodes: + +```yaml +- name: resource-strategy-fit + arguments: + resourceStrategyFitWeight: 10 + resources: + nvidia.com/gpu: + type: MostAllocated + weight: 5 + cpu: + type: LeastAllocated + weight: 1 + memory: + type: LeastAllocated + weight: 1 +``` + +##### Example 2: Workload Distribution + +Configure the plugin to distribute workloads evenly across nodes: + +```yaml +- name: resource-strategy-fit + arguments: + resourceStrategyFitWeight: 10 + resources: + cpu: + type: LeastAllocated + weight: 3 + memory: + type: LeastAllocated + weight: 2 +``` + +##### Example 3: With SRA + +Enable SRA for enhanced GPU allocation: + +```yaml +- name: resource-strategy-fit + arguments: + resourceStrategyFitWeight: 10 + resources: + nvidia.com/gpu: + type: MostAllocated + weight: 2 + sra: + enable: true + resources: nvidia.com/gpu + weight: 10 + resourceWeight: + nvidia.com/gpu: 1 +``` diff --git a/content/en/docs/plugins/resourcequota.md b/content/en/docs/plugins/resourcequota.md new file mode 100644 index 00000000..07706a78 --- /dev/null +++ b/content/en/docs/plugins/resourcequota.md @@ -0,0 +1,212 @@ ++++ +title = "Resource Quota Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Resource Quota" +[menu.plugins] + weight = 7 ++++ + +### Resource Quota + +#### Overview + +The Resource Quota plugin enforces Kubernetes ResourceQuota constraints during job enqueue. It ensures that jobs can only be enqueued if their minimum resource requirements do not exceed the available quota in their namespace. The plugin integrates with Kubernetes ResourceQuota objects to provide namespace-level resource limits and isolation. + +#### Features + +- **ResourceQuota Enforcement**: Enforces Kubernetes ResourceQuota constraints during job enqueue +- **Namespace-level Isolation**: Provides resource isolation at the namespace level +- **Pending Resource Tracking**: Tracks pending resources to prevent over-allocation +- **Event Recording**: Records PodGroup events when quota limits are exceeded +- **MinResources Validation**: Validates jobs against their minimum resource requirements + +#### Configuration + +The Resource Quota plugin requires no special configuration. It automatically works with existing Kubernetes ResourceQuota objects: + +```yaml +actions: "enqueue, allocate, backfill" +tiers: +- plugins: + - name: resourcequota +``` + +#### How It Works + +1. **Job Enqueue**: When a job is enqueued, the plugin checks if the job's minimum resource requirements fit within the namespace's ResourceQuota. +2. **Quota Validation**: For each ResourceQuota in the namespace, the plugin: + - Checks if the job's minimum resources plus already used resources plus pending resources exceed the quota hard limits + - If quota is exceeded, the job is rejected from enqueue +3. **Pending Resource Tracking**: The plugin tracks pending resources (jobs that have been accepted for enqueue but not yet allocated) to prevent over-allocation. +4. **Event Recording**: When a job is rejected due to quota limits, the plugin records a PodGroup event with details about the insufficient resources. + +#### ResourceQuota Configuration + +ResourceQuota objects must be created in the target namespace: + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: compute-quota + namespace: default +spec: + hard: + requests.cpu: "100" + requests.memory: 200Gi + requests.nvidia.com/gpu: "8" + limits.cpu: "200" + limits.memory: 400Gi + limits.nvidia.com/gpu: "16" + pods: "50" +``` + +#### Job Configuration + +Jobs must specify minimum resources for the quota check to work: + +```yaml +apiVersion: batch.volcano.sh/v1alpha1 +kind: Job +metadata: + name: example-job + namespace: default +spec: + minAvailable: 3 + schedulerName: volcano + queue: default + minResources: + requests: + cpu: "6" + memory: 12Gi + nvidia.com/gpu: "1" + tasks: + - replicas: 3 + name: "task" + template: + spec: + containers: + - name: container + resources: + requests: + cpu: "2" + memory: 4Gi + nvidia.com/gpu: "1" + limits: + cpu: "4" + memory: 8Gi + nvidia.com/gpu: "1" +``` + +#### Scenario + +The Resource Quota plugin is suitable for: + +- **Multi-tenant Clusters**: Enforcing resource limits per namespace/tenant +- **Resource Isolation**: Preventing one namespace from consuming all cluster resources +- **Cost Control**: Limiting resource consumption to control costs +- **Capacity Planning**: Ensuring resource allocation stays within planned capacity +- **Fair Resource Sharing**: Ensuring fair distribution of resources across namespaces + +#### Examples + +##### Example 1: Basic ResourceQuota + +Create a ResourceQuota to limit CPU and memory: + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: team-a-quota + namespace: team-a +spec: + hard: + requests.cpu: "50" + requests.memory: 100Gi + limits.cpu: "100" + limits.memory: 200Gi + pods: "20" +``` + +##### Example 2: GPU ResourceQuota + +Create a ResourceQuota with GPU limits: + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: gpu-quota + namespace: ml-team +spec: + hard: + requests.cpu: "100" + requests.memory: 200Gi + requests.nvidia.com/gpu: "16" + limits.cpu: "200" + limits.memory: 400Gi + limits.nvidia.com/gpu: "32" +``` + +##### Example 3: Multiple ResourceQuotas + +A namespace can have multiple ResourceQuotas: + +```yaml +# CPU and memory quota +apiVersion: v1 +kind: ResourceQuota +metadata: + name: compute-quota + namespace: default +spec: + hard: + requests.cpu: "100" + requests.memory: 200Gi + +--- +# GPU quota +apiVersion: v1 +kind: ResourceQuota +metadata: + name: gpu-quota + namespace: default +spec: + hard: + requests.nvidia.com/gpu: "8" +``` + +##### Example 4: Pod Limits + +Create a ResourceQuota that limits the number of pods: + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: pod-limit-quota + namespace: default +spec: + hard: + pods: "100" +``` + +#### Notes + +- ResourceQuota objects must exist in the namespace before jobs are enqueued +- Jobs must specify `minResources` for the quota check to work +- The plugin checks quota during job enqueue, not during task allocation +- Pending resources are tracked to prevent over-allocation +- If a namespace has no ResourceQuota, jobs can be enqueued without quota checks +- The plugin supports all resource types supported by Kubernetes ResourceQuota +- ResourceQuota scope constraints are not currently supported +- The plugin integrates with Volcano's job enqueue mechanism to provide early quota validation diff --git a/content/en/docs/plugins/usage.md b/content/en/docs/plugins/usage.md new file mode 100644 index 00000000..0877bf13 --- /dev/null +++ b/content/en/docs/plugins/usage.md @@ -0,0 +1,177 @@ ++++ +title = "Usage Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Usage" +[menu.plugins] + weight = 6 ++++ + +### Usage + +#### Overview + +The Usage plugin provides CPU and memory usage-based scheduling. It filters nodes based on resource usage thresholds and scores nodes based on their current resource utilization. The plugin helps prevent scheduling on overloaded nodes and prefers nodes with lower resource usage, improving overall cluster utilization and workload performance. + +#### Features + +- **Usage-based Filtering**: Filter nodes based on CPU and memory usage thresholds +- **Usage-based Scoring**: Score nodes based on current resource utilization +- **Configurable Thresholds**: Set custom thresholds for CPU and memory usage +- **Weighted Scoring**: Configurable weights for usage, CPU, and memory in scoring +- **Predicate Control**: Optional enable/disable predicate filtering +- **Metrics Integration**: Uses node resource usage metrics for decision making + +#### Configuration + +The Usage plugin can be configured with the following arguments: + +```yaml +actions: "enqueue, allocate, backfill" +tiers: +- plugins: + - name: usage + enablePredicate: true + arguments: + usage.weight: 5 + cpu.weight: 1 + memory.weight: 1 + thresholds: + cpu: 80 + mem: 80 +``` + +##### Configuration Parameters + +- **enablePredicate** (bool): Enable/disable predicate filtering. When set to `false`, new pod scheduling is not disabled when the node load reaches the threshold. Default is `true`. +- **usage.weight** (int): Global weight for the usage plugin scoring. Default is `5`. +- **cpu.weight** (int): Weight for CPU usage in scoring calculation. Default is `1`. +- **memory.weight** (int): Weight for memory usage in scoring calculation. Default is `1`. +- **thresholds.cpu** (float): CPU usage threshold percentage. Nodes exceeding this threshold will be filtered out (if predicate is enabled). Default is `80`. +- **thresholds.mem** (float): Memory usage threshold percentage. Nodes exceeding this threshold will be filtered out (if predicate is enabled). Default is `80`. + +#### How It Works + +1. **Metrics Collection**: The plugin uses node resource usage metrics provided by the metrics collector. +2. **Predicate Phase**: If enabled, nodes with CPU or memory usage exceeding the configured thresholds are filtered out. +3. **Scoring Phase**: Nodes are scored based on their current resource utilization. Lower usage results in higher scores. +4. **Scoring Formula**: The score is calculated as: + - CPU score: `(100 - cpuUsage) / 100 * cpuWeight` + - Memory score: `(100 - memoryUsage) / 100 * memoryWeight` + - Combined score: `(cpuScore + memoryScore) / (cpuWeight + memoryWeight) * usageWeight * MaxNodeScore` + +#### Metrics Requirements + +The Usage plugin requires node resource usage metrics to be available. Metrics must be updated within the last 5 minutes to be considered valid. If metrics are not available or are stale, the plugin will: + +- **Predicate**: Allow scheduling (pass the filter) +- **Scoring**: Return a score of 0 + +#### Scenario + +The Usage plugin is suitable for: + +- **Load Balancing**: Distributing workloads across nodes to balance resource utilization +- **Overload Prevention**: Preventing scheduling on overloaded nodes +- **Performance Optimization**: Preferring nodes with lower resource usage for better performance +- **Cost Optimization**: Improving resource utilization across the cluster +- **Workload Distribution**: Ensuring even distribution of workloads based on resource consumption + +#### Examples + +##### Example 1: Basic Usage Configuration + +Configure the plugin with default thresholds: + +```yaml +- name: usage + enablePredicate: true + arguments: + usage.weight: 5 + cpu.weight: 1 + memory.weight: 1 + thresholds: + cpu: 80 + mem: 80 +``` + +##### Example 2: Conservative Thresholds + +Configure stricter thresholds to prevent overload: + +```yaml +- name: usage + enablePredicate: true + arguments: + usage.weight: 10 + cpu.weight: 2 + memory.weight: 2 + thresholds: + cpu: 70 + mem: 70 +``` + +##### Example 3: Scoring Only (No Predicate) + +Disable predicate filtering and use only scoring: + +```yaml +- name: usage + enablePredicate: false + arguments: + usage.weight: 5 + cpu.weight: 1 + memory.weight: 1 + thresholds: + cpu: 80 + mem: 80 +``` + +##### Example 4: CPU-focused Configuration + +Prioritize CPU usage over memory: + +```yaml +- name: usage + enablePredicate: true + arguments: + usage.weight: 10 + cpu.weight: 3 + memory.weight: 1 + thresholds: + cpu: 75 + mem: 85 +``` + +##### Example 5: Memory-focused Configuration + +Prioritize memory usage over CPU: + +```yaml +- name: usage + enablePredicate: true + arguments: + usage.weight: 10 + cpu.weight: 1 + memory.weight: 3 + thresholds: + cpu: 85 + mem: 75 +``` + +#### Notes + +- The plugin requires node resource usage metrics to be available +- Metrics must be updated within the last 5 minutes to be considered valid +- Threshold values are percentages (0-100) +- Weights determine the relative importance of different resources in scoring +- When predicate is disabled, the plugin only affects node scoring, not filtering +- The plugin uses average usage metrics over a configured period +- If metrics are not available, the plugin allows scheduling to proceed