-
Notifications
You must be signed in to change notification settings - Fork 26
feat(validator): align validation Job scheduling with recipe/bundle configurability #443
Description
Problem
AICR recipes and bundles expose a rich set of scheduling configurability (node selectors, tolerations, affinities, Helm values) that allow users to control where and how workloads run. However, validation Jobs do not honor the same configuration inputs — specifically --node-selector is parsed by the CLI but only forwarded to the snapshot agent, never to validator Jobs.
This creates friction for users operating in constrained or non-standard environments (private networking, specific node placement, etc.), where significant effort goes into working around validation assumptions due to missing configurability.
Current state
| Scheduling field | Recipe/bundles | Validator Jobs |
|---|---|---|
nodeSelector |
Supported via nodeScheduling paths |
Not supported |
tolerations |
Supported via nodeScheduling paths |
Supported (CLI --toleration) |
affinity |
Supported via overlay overrides | Hardcoded: prefer CPU nodes |
The --node-selector flag exists on aicr validate but is categorized under "Agent Deployment" and only flows to the snapshot agent — not to validator Jobs.
Design principles
- Validators should accept all scheduling flags that AICR supports
- Even if a given validator does not actively use a flag, it must not reject or ignore it silently
- Validators remain lightweight and phase-specific — no tight coupling to global config
Acceptance criteria
-
--node-selector key=valueconstrains validator Jobs to matching nodes -
--tolerationcontinues to work (no regression) - Default behavior (no node selector) is unchanged
- CLI help shows scheduling flags clearly
- Tests cover nodeSelector application and empty/nil behavior