Skip to content

Conversation

@shiverse94
Copy link

  • Please check if the PR fulfills these requirements
  • The commit message follows our guidelines
  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    /kind documentation

  • What this PR does / why we need it:
    Adds documentation for 7 missing scheduler plugins that exist in the volcano-sh/volcano repository but were not documented on the website. This aligns the website documentation with available plugins.

  • Which issue(s) this PR fixes:
    Docs(Plugins): Synchronize plugin documentation with the main volcano repository #427

Addresses missing plugin documentation to improve user understanding and usage of available Volcano scheduler plugins.

// @JesseStutler

… deviceshare, extender, nodegroup, usage, and resourcequota plugins

Signed-off-by: Shive <shiverse94@gmail.com>
@volcano-sh-bot volcano-sh-bot added the kind/documentation Categorizes issue or PR as related to documentation. label Jan 18, 2026
@volcano-sh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kevin-wangzefeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link

Summary of Changes

Hello @shiverse94, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Volcano scheduler documentation by introducing comprehensive guides for seven previously undocumented plugins. These additions aim to bridge the gap between existing scheduler functionalities and their official explanations, thereby improving user understanding and enabling more effective utilization of Volcano's advanced scheduling capabilities.

Highlights

  • Capacity Plugin Documentation: Added new documentation for the Capacity plugin, which manages queue resource allocation with capacity limits, guarantees, and hierarchical structures.
  • Device Share Plugin Documentation: New documentation introduced for the Device Share plugin, detailing its features for sharing and allocating GPU and NPU resources, including vGPU and Ascend NPU support.
  • Extender Plugin Documentation: Comprehensive documentation for the Extender plugin has been added, explaining its functionality to delegate scheduling decisions to external HTTP services and its various scheduling hooks.
  • Node Group Plugin Documentation: Documentation for the Node Group plugin was added, covering queue-level node group affinity and anti-affinity, hierarchical inheritance, and node labeling.
  • Resource Strategy Fit Plugin Documentation: New documentation for the Resource Strategy Fit plugin, describing its flexible resource allocation strategies, including MostAllocated and LeastAllocated scoring, SRA, and proportional allocation.
  • Resource Quota Plugin Documentation: Documentation for the Resource Quota plugin was added, detailing its enforcement of Kubernetes ResourceQuota constraints during job enqueue and namespace-level isolation.
  • Usage Plugin Documentation: New documentation for the Usage plugin, which provides CPU and memory usage-based scheduling, including filtering, scoring, and configurable thresholds.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@volcano-sh-bot volcano-sh-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 18, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds much-needed documentation for seven scheduler plugins. The new documentation is well-structured and detailed, which will significantly help users understand and configure these plugins. I've provided a few suggestions to improve clarity and consistency in some of the new documents. Overall, this is a great contribution to improving Volcano's usability.


#### Features

- **Queue Capacity Management**: Enforces queue capacity limits based on configured capability

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better clarity and consistency with other parts of the documentation that refer to configuration fields, consider formatting capability as code to indicate it's a specific field name.

Suggested change
- **Queue Capacity Management**: Enforces queue capacity limits based on configured capability
- **Queue Capacity Management**: Enforces queue capacity limits based on configured `capability`

- **extender.allocateFuncVerb** (string): Verb for AllocateFunc event handler
- **extender.deallocateFuncVerb** (string): Verb for DeallocateFunc event handler
- **extender.ignorable** (bool): Whether the extender can ignore unexpected errors
- **extender.managedResources** (list): List of resources managed by the extender (comma-separated or list format)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description for extender.managedResources says it supports a "comma-separated or list format". The YAML example shows a list. To avoid confusion, could you please clarify if a comma-separated string is also a valid value for this argument in the YAML configuration? If it is, providing an example for that format would be helpful.


##### Configuration Parameters

- **strict** (bool): Enable strict mode. In strict mode, nodes without node group labels are rejected if the queue has affinity rules, and nodes with node group labels are rejected if the queue has no affinity rules. Default is `true`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The explanation for strict mode could be clearer. The phrase "nodes with node group labels are rejected if the queue has no affinity rules" is a bit counter-intuitive. It would be helpful to add more context or a rationale for this behavior, for example, by explaining that this ensures that general-purpose queues do not land on specialized node groups by default.

- Preferred affinity rules are soft constraints and affect scoring
- Anti-affinity rules take precedence over affinity rules
- In hierarchical queues, child queues inherit affinity rules from their nearest ancestor with affinity rules
- When hierarchical queues are enabled, set `enableHierarchy: true` in the plugin configuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This note about enableHierarchy: true might be misleading, as this setting is part of the capacity plugin, not the nodegroup plugin. To avoid confusion, I suggest clarifying that hierarchical queue support for node group affinity inheritance depends on the capacity plugin being configured with enableHierarchy: true.

Suggested change
- When hierarchical queues are enabled, set `enableHierarchy: true` in the plugin configuration
- In hierarchical queues, child queues inherit affinity rules. This feature requires the `capacity` plugin to be enabled with `enableHierarchy: true` in the scheduler configuration.

- **weight**: Weight for this resource in scoring calculation
- **sra** (optional): SRA configuration:
- **enable**: Enable/disable SRA
- **resources**: Comma-separated list of resources for SRA

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description says "Comma-separated list of resources", but the configuration example on line 56 only shows a single resource (resources: nvidia.com/gpu). This could be confusing. To make it clearer for users how to specify multiple resources, please consider updating the example to show a comma-separated list, for example: resources: "nvidia.com/gpu,amd.com/gpu". This also applies to proportional.resources described on line 81.

4. **Scoring Formula**: The score is calculated as:
- CPU score: `(100 - cpuUsage) / 100 * cpuWeight`
- Memory score: `(100 - memoryUsage) / 100 * memoryWeight`
- Combined score: `(cpuScore + memoryScore) / (cpuWeight + memoryWeight) * usageWeight * MaxNodeScore`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The scoring formula uses MaxNodeScore, but this variable isn't defined in the document. It would be helpful to briefly explain what MaxNodeScore represents (e.g., that it's an internal scheduler constant and its value, if relevant).

- Threshold values are percentages (0-100)
- Weights determine the relative importance of different resources in scoring
- When predicate is disabled, the plugin only affects node scoring, not filtering
- The plugin uses average usage metrics over a configured period

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There seems to be a slight inconsistency in the description of the metrics used. The "Overview" and "How It Works" sections refer to "current resource utilization", while this note mentions "average usage metrics over a configured period". To avoid confusion, could you clarify the relationship between these? For example, is the "current" utilization actually a recent average?

Signed-off-by: Shive <shiverse94@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/documentation Categorizes issue or PR as related to documentation. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants