Skip to content

ART-14334: Add scan-operator job#2448

Open
adobes1 wants to merge 8 commits intoopenshift-eng:mainfrom
adobes1:add-scan-operator
Open

ART-14334: Add scan-operator job#2448
adobes1 wants to merge 8 commits intoopenshift-eng:mainfrom
adobes1:add-scan-operator

Conversation

@adobes1
Copy link
Contributor

@adobes1 adobes1 commented Feb 5, 2026

Add a periodic job to scan for missing operator bundle/FBC builds and trigger them if missing.

Scheduler (pyartcd/pipelines/scheduled/schedule_scan_operator.py):

  • Runs every 30 minutes via jenkins scheduling
  • Checks locks and build permissions before scheduling
  • Skips scan if SCAN_OPERATOR or BUILD_KONFLUX locks are held
  • Triggers scan-operator job for all applicable versions

New Pipeline (pyartcd/pipelines/scan_operator.py):

  • Scans all operators
  • Checks for missing bundle builds for successful operator builds
  • Checks for missing FBC builds for operator builds which have successful bundle builds
  • Trigger bundle or FBC builds where necessary

Doozer Enhancement (doozer/cli/olm_bundle.py):

  • Added --output-format flag to olm-bundle:list-olm-operators command
  • Supports component (default, backward compatible) and distgit-key formats

Test runs (with --dry-run):

Required for openshift-eng/aos-cd-jobs#4571

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 5, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 5, 2026

@adobes1: This pull request references ART-14334 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add a periodic job to scan for missing operator bundle/FBC builds and trigger them if missing.

Scheduler (pyartcd/pipelines/scheduled/schedule_scan_operator.py):

  • Runs every 30 minutes via jenkins scheduling
  • Checks locks and build permissions before scheduling
  • Skips scan if SCAN_OPERATOR or BUILD_KONFLUX locks are held
  • Triggers scan-operator job for all applicable versions

New Pipeline (pyartcd/pipelines/scan_operator.py):

  • Scans all operators
  • Checks for missing bundle builds for successful operator builds
  • Checks for missing FBC builds for operator builds which have successful bundle builds
  • Trigger bundle or FBC builds where necessary

Doozer Enhancement (doozer/cli/olm_bundle.py):

  • Added --output-format flag to olm-bundle:list-olm-operators command
  • Supports component (default, backward compatible) and distgit-key formats

Test runs (with --dry-run):

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vfreex for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 5, 2026

Walkthrough

Adds a ScanOperatorPipeline and CLI/scheduled runners to detect missing operator bundles and FBCs and trigger Jenkins builds with locking; adds Jenkins job and lock policy; exposes new CLI entries; and extends doozer OLM listing CLI with a --output-format option. (≤50 words)

Changes

Cohort / File(s) Summary
Doozer OLM CLI
doozer/doozerlib/cli/olm_bundle.py
Added --output-format click option (component or distgit-key, default component); updated list_olm_operators signature and output branching; updated command docstring/examples.
Scan Operator Pipeline
pyartcd/pyartcd/pipelines/scan_operator.py
New ScanOperatorPipeline class and scan_operator CLI: initializes Konflux clients, validates params, loads operator names, fetches latest builds, checks for missing bundles/FBCs, and triggers bulk Jenkins builds (supports dry-run and locking).
Scheduled Runner
pyartcd/pyartcd/pipelines/scheduled/schedule_scan_operator.py
New scheduled runner: run_for(version, runtime, lock_manager) checks locks/build permission and calls jenkins.start_scan_operator; schedule_scan_operator CLI concurrently schedules across versions and manages LockManager lifecycle.
Jenkins Integration
pyartcd/pyartcd/jenkins.py
Added Jobs.SCAN_OPERATOR enum entry and start_scan_operator(version: str, **kwargs) helper to trigger the scan-operator Jenkins job.
Locks & Policies
pyartcd/pyartcd/locks.py
Added Lock.SCAN_OPERATOR enum member and LOCK_POLICY entry for it (retry_count=36000, retry_delay_min=0.1, lock_timeout=DEFAULT_LOCK_TIMEOUT).
CLI Exports
pyartcd/pyartcd/__main__.py
Exported new CLI entry points scan_operator and schedule_scan_operator so commands are available via package CLI.
Minor cleanup
pyartcd/pyartcd/pipelines/ocp4_scan_konflux.py
Removed unused local variable rhcos_changes and its assignments (dead/unused flag removed).

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as "scan-operator CLI"
    participant LockMgr as "Lock Manager"
    participant Pipeline as "ScanOperatorPipeline"
    participant Doozer as "Doozer"
    participant Konflux as "Konflux DB"
    participant Jenkins as "Jenkins"

    User->>CLI: invoke scan-operator
    CLI->>LockMgr: acquire locks (scan/build)
    LockMgr-->>CLI: locks acquired
    CLI->>Pipeline: create and run

    Pipeline->>Doozer: load_operator_names()
    Doozer-->>Pipeline: operator names

    Pipeline->>Konflux: get_latest_operator_builds()
    Konflux-->>Pipeline: build records

    loop per operator
        Pipeline->>Konflux: check_bundle_exists()
        Konflux-->>Pipeline: bundle present/absent
        alt bundle missing
            Pipeline->>Konflux: check_fbc_exists()
            Konflux-->>Pipeline: fbc present/absent
        end
    end

    Pipeline->>Jenkins: trigger_bundle_builds()/trigger_fbc_builds()
    Jenkins-->>Pipeline: builds queued/started

    Pipeline-->>CLI: run complete
    CLI->>LockMgr: release locks
    LockMgr-->>CLI: locks released
    CLI-->>User: report results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@pyartcd/pyartcd/pipelines/scan_operator.py`:
- Around line 163-166: The local variable fbc_record returned from
check_fbc_exists is unused and causing a lint warning; either drop it or rename
it to _fbc_record. Update the call site inside the method that performs "if
bundle_exists and bundle:" where it currently does "fbc_exists, fbc_record =
await self.check_fbc_exists(operator, bundle)" — change to "fbc_exists,
_fbc_record = await self.check_fbc_exists(operator, bundle)" or just await and
unpack only fbc_exists (e.g., "fbc_exists, = await ...") and leave the rest of
the logic (appending to self.operators_without_fbcs when not fbc_exists)
unchanged.
- Around line 81-86: Wrap the call to
uses_konflux_imagestream_override(self.version) in a try/except to catch
ValueError so invalid version formats don't crash the run, logging the exception
details via self.logger.error; then change the info log emitted when the check
returns False to clearly distinguish an unsupported older but well-formed
version (e.g., "Version X is not supported; requires >= 4.12") from a malformed
version (which should be logged as an error with the exception). Reference
uses_konflux_imagestream_override and the surrounding logic in the
ScanOperator/scan flow to locate where to add the try/except and adjust the log
message.
🧹 Nitpick comments (2)
pyartcd/pyartcd/pipelines/scheduled/schedule_scan_operator.py (1)

32-35: Avoid blocking the event loop when triggering Jenkins builds.
jenkins.start_scan_operator is synchronous; calling it in an async coroutine can block other scheduled work. Consider offloading to a thread.

Suggested change
-    jenkins.start_scan_operator(version=version, block_until_building=False)
+    await asyncio.to_thread(
+        jenkins.start_scan_operator,
+        version=version,
+        block_until_building=False,
+    )
pyartcd/pyartcd/pipelines/scan_operator.py (1)

255-284: Run Jenkins triggers off-thread and keep stack traces.
The Jenkins calls are synchronous and can block the event loop; also use logger.exception to preserve stack traces when catching broad exceptions.

Suggested change
-        try:
-            self.logger.info(f'Triggering bundle builds for {len(nvrs)} operators')
-            jenkins.start_olm_bundle_konflux(
-                build_version=self.version,
-                assembly=self.assembly,
-                operator_nvrs=nvrs,
-                group=self.group,
-            )
-        except Exception as e:
-            self.logger.error(f'Failed to trigger bundle builds: {e}')
+        try:
+            self.logger.info(f'Triggering bundle builds for {len(nvrs)} operators')
+            await asyncio.to_thread(
+                jenkins.start_olm_bundle_konflux,
+                build_version=self.version,
+                assembly=self.assembly,
+                operator_nvrs=nvrs,
+                group=self.group,
+            )
+        except Exception:
+            self.logger.exception('Failed to trigger bundle builds')
@@
-        try:
-            self.logger.info(f'Triggering FBC builds for {len(nvrs)} operators')
-            jenkins.start_build_fbc(
-                version=self.version,
-                assembly=self.assembly,
-                operator_nvrs=nvrs,
-                dry_run=False,
-                group=self.group,
-            )
-        except Exception as e:
-            self.logger.error(f'Failed to trigger FBC builds: {e}')
+        try:
+            self.logger.info(f'Triggering FBC builds for {len(nvrs)} operators')
+            await asyncio.to_thread(
+                jenkins.start_build_fbc,
+                version=self.version,
+                assembly=self.assembly,
+                operator_nvrs=nvrs,
+                dry_run=False,
+                group=self.group,
+            )
+        except Exception:
+            self.logger.exception('Failed to trigger FBC builds')

@adobes1 adobes1 added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Feb 5, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pyartcd/pyartcd/pipelines/scan_operator.py`:
- Around line 245-283: The current try/except blocks in trigger_bundle_builds
and trigger_fbc_builds swallow all exceptions from
jenkins.start_olm_bundle_konflux and jenkins.start_build_fbc, hiding trigger
failures; change each handler to log the error and then re-raise the exception
(or remove the try/except entirely) so failures propagate and the scan job fails
when builds cannot be triggered, e.g., in trigger_bundle_builds (and
trigger_fbc_builds) log the failure with self.logger.error(...) and then raise
the caught exception.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 5, 2026

@adobes1: This pull request references ART-14334 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add a periodic job to scan for missing operator bundle/FBC builds and trigger them if missing.

Scheduler (pyartcd/pipelines/scheduled/schedule_scan_operator.py):

  • Runs every 30 minutes via jenkins scheduling
  • Checks locks and build permissions before scheduling
  • Skips scan if SCAN_OPERATOR or BUILD_KONFLUX locks are held
  • Triggers scan-operator job for all applicable versions

New Pipeline (pyartcd/pipelines/scan_operator.py):

  • Scans all operators
  • Checks for missing bundle builds for successful operator builds
  • Checks for missing FBC builds for operator builds which have successful bundle builds
  • Trigger bundle or FBC builds where necessary

Doozer Enhancement (doozer/cli/olm_bundle.py):

  • Added --output-format flag to olm-bundle:list-olm-operators command
  • Supports component (default, backward compatible) and distgit-key formats

Test runs (with --dry-run):

Required for openshift-eng/aos-cd-jobs#4571

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pyartcd/pyartcd/pipelines/scan_operator.py`:
- Around line 72-121: The run method currently logs exceptions returned from
check_operator but doesn't fail the pipeline; update run to aggregate any
exceptions from results (from asyncio.gather of self.check_operator for
operator_builds) into the existing trigger_errors list (or a new scan_errors
list) and raise at the end if any exist, so failures in check_operator cause the
job to fail; specifically, after the for-loop that inspects results, append any
Exception instances (including their operator id like operator_builds[i].nvr) to
trigger_errors (or merge scan_errors into trigger_errors) and let the final
RuntimeError include both trigger and scan failures before exiting.

Comment on lines +72 to +121
async def run(self):
"""Main pipeline execution."""
self.skipped = False
self.logger.info(f'Scanning version {self.version}, assembly {self.assembly}')

# Check if it's a valid version
if not uses_konflux_imagestream_override(self.version):
self.logger.info(f'Version {self.version} is not a valid version')
return

self.check_params()

operator_names = await self.load_operator_names()
self.logger.info(f'Found {len(operator_names)} operators in group {self.group}')

operator_builds = await self.get_latest_operator_builds(operator_names)
self.logger.info(f'Found {len(operator_builds)} operator builds with latest successful builds')

if not operator_builds:
self.logger.info('No operator builds found')
return

# Check if each operator has an associated bundle and fbc
results = await asyncio.gather(*[self.check_operator(op) for op in operator_builds], return_exceptions=True)

for i, result in enumerate(results):
if isinstance(result, Exception):
self.logger.error(f'Failed to process {operator_builds[i].nvr}: {result}')

# Track failures to trigger bundle/fbc jobs
trigger_errors = []

if self.operators_without_bundles:
try:
self.trigger_bundle_builds(self.operators_without_bundles)
except Exception as e:
trigger_errors.append(e)
self.logger.error(f'Failed to trigger bundle builds: {e}')

if self.operators_without_fbcs:
try:
self.trigger_fbc_builds(self.operators_without_fbcs)
except Exception as e:
trigger_errors.append(e)
self.logger.error(f'Failed to trigger FBC builds: {e}')

# Raise at the end to mark job as failed
if trigger_errors:
raise RuntimeError(f"Failed to trigger bundle and/or FBC builds: {', '.join(map(str, trigger_errors))}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail the run when operator checks error out. Exceptions from check_operator are logged but do not currently fail the job, which can silently miss required triggers. Consider aggregating these errors and failing at the end alongside trigger failures.

💡 Proposed fix (aggregate scan errors)
-        results = await asyncio.gather(*[self.check_operator(op) for op in operator_builds], return_exceptions=True)
-
-        for i, result in enumerate(results):
-            if isinstance(result, Exception):
-                self.logger.error(f'Failed to process {operator_builds[i].nvr}: {result}')
-
-        # Track failures to trigger bundle/fbc jobs
-        trigger_errors = []
+        # Track failures from operator checks and triggers
+        trigger_errors = []
+        results = await asyncio.gather(*[self.check_operator(op) for op in operator_builds], return_exceptions=True)
+
+        for i, result in enumerate(results):
+            if isinstance(result, Exception):
+                trigger_errors.append(result)
+                self.logger.error(f'Failed to process {operator_builds[i].nvr}: {result}')
@@
-        if trigger_errors:
-            raise RuntimeError(f"Failed to trigger bundle and/or FBC builds: {', '.join(map(str, trigger_errors))}")
+        if trigger_errors:
+            raise RuntimeError(f"Failed to process operators and/or trigger builds: {', '.join(map(str, trigger_errors))}")
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 107-107: Do not catch blind exception: Exception

(BLE001)


[warning] 109-109: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


[warning] 114-114: Do not catch blind exception: Exception

(BLE001)


[warning] 116-116: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


[warning] 120-120: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@pyartcd/pyartcd/pipelines/scan_operator.py` around lines 72 - 121, The run
method currently logs exceptions returned from check_operator but doesn't fail
the pipeline; update run to aggregate any exceptions from results (from
asyncio.gather of self.check_operator for operator_builds) into the existing
trigger_errors list (or a new scan_errors list) and raise at the end if any
exist, so failures in check_operator cause the job to fail; specifically, after
the for-loop that inspects results, append any Exception instances (including
their operator id like operator_builds[i].nvr) to trigger_errors (or merge
scan_errors into trigger_errors) and let the final RuntimeError include both
trigger and scan failures before exiting.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pyartcd/pyartcd/pipelines/scan_operator.py`:
- Around line 311-312: The file fails CI due to formatting; run the code
formatter and reformat the scan_operator function and surrounding file with ruff
format (or your configured formatter) to satisfy linting. Specifically run `ruff
format` on the module containing the async def scan_operator(runtime: Runtime,
version: str, assembly: str, data_path: str, data_gitref: str)` so the CLI will
rewrite that signature and surrounding whitespace to match project style, then
stage and commit the resulting changes.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 5, 2026

@adobes1: This pull request references ART-14334 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add a periodic job to scan for missing operator bundle/FBC builds and trigger them if missing.

Scheduler (pyartcd/pipelines/scheduled/schedule_scan_operator.py):

  • Runs every 30 minutes via jenkins scheduling
  • Checks locks and build permissions before scheduling
  • Skips scan if SCAN_OPERATOR or BUILD_KONFLUX locks are held
  • Triggers scan-operator job for all applicable versions

New Pipeline (pyartcd/pipelines/scan_operator.py):

  • Scans all operators
  • Checks for missing bundle builds for successful operator builds
  • Checks for missing FBC builds for operator builds which have successful bundle builds
  • Trigger bundle or FBC builds where necessary

Doozer Enhancement (doozer/cli/olm_bundle.py):

  • Added --output-format flag to olm-bundle:list-olm-operators command
  • Supports component (default, backward compatible) and distgit-key formats

Test runs (with --dry-run):

Required for openshift-eng/aos-cd-jobs#4571

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 5, 2026

@adobes1: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security 3518abe link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants