Skip to content

feat(validator): align validation Job scheduling with recipe/bundle configurability #443

@atif1996

Description

@atif1996

Problem

AICR recipes and bundles expose a rich set of scheduling configurability (node selectors, tolerations, affinities, Helm values) that allow users to control where and how workloads run. However, validation Jobs do not honor the same configuration inputs — specifically --node-selector is parsed by the CLI but only forwarded to the snapshot agent, never to validator Jobs.

This creates friction for users operating in constrained or non-standard environments (private networking, specific node placement, etc.), where significant effort goes into working around validation assumptions due to missing configurability.

Current state

Scheduling field Recipe/bundles Validator Jobs
nodeSelector Supported via nodeScheduling paths Not supported
tolerations Supported via nodeScheduling paths Supported (CLI --toleration)
affinity Supported via overlay overrides Hardcoded: prefer CPU nodes

The --node-selector flag exists on aicr validate but is categorized under "Agent Deployment" and only flows to the snapshot agent — not to validator Jobs.

Design principles

  • Validators should accept all scheduling flags that AICR supports
  • Even if a given validator does not actively use a flag, it must not reject or ignore it silently
  • Validators remain lightweight and phase-specific — no tight coupling to global config

Acceptance criteria

  • --node-selector key=value constrains validator Jobs to matching nodes
  • --toleration continues to work (no regression)
  • Default behavior (no node selector) is unchanged
  • CLI help shows scheduling flags clearly
  • Tests cover nodeSelector application and empty/nil behavior

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions