ART-14334: Add scan-operator job#2448
Conversation
|
@adobes1: This pull request references ART-14334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughAdds a ScanOperatorPipeline and CLI/scheduled runners to detect missing operator bundles and FBCs and trigger Jenkins builds with locking; adds Jenkins job and lock policy; exposes new CLI entries; and extends doozer OLM listing CLI with a --output-format option. (≤50 words) Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as "scan-operator CLI"
participant LockMgr as "Lock Manager"
participant Pipeline as "ScanOperatorPipeline"
participant Doozer as "Doozer"
participant Konflux as "Konflux DB"
participant Jenkins as "Jenkins"
User->>CLI: invoke scan-operator
CLI->>LockMgr: acquire locks (scan/build)
LockMgr-->>CLI: locks acquired
CLI->>Pipeline: create and run
Pipeline->>Doozer: load_operator_names()
Doozer-->>Pipeline: operator names
Pipeline->>Konflux: get_latest_operator_builds()
Konflux-->>Pipeline: build records
loop per operator
Pipeline->>Konflux: check_bundle_exists()
Konflux-->>Pipeline: bundle present/absent
alt bundle missing
Pipeline->>Konflux: check_fbc_exists()
Konflux-->>Pipeline: fbc present/absent
end
end
Pipeline->>Jenkins: trigger_bundle_builds()/trigger_fbc_builds()
Jenkins-->>Pipeline: builds queued/started
Pipeline-->>CLI: run complete
CLI->>LockMgr: release locks
LockMgr-->>CLI: locks released
CLI-->>User: report results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@pyartcd/pyartcd/pipelines/scan_operator.py`:
- Around line 163-166: The local variable fbc_record returned from
check_fbc_exists is unused and causing a lint warning; either drop it or rename
it to _fbc_record. Update the call site inside the method that performs "if
bundle_exists and bundle:" where it currently does "fbc_exists, fbc_record =
await self.check_fbc_exists(operator, bundle)" — change to "fbc_exists,
_fbc_record = await self.check_fbc_exists(operator, bundle)" or just await and
unpack only fbc_exists (e.g., "fbc_exists, = await ...") and leave the rest of
the logic (appending to self.operators_without_fbcs when not fbc_exists)
unchanged.
- Around line 81-86: Wrap the call to
uses_konflux_imagestream_override(self.version) in a try/except to catch
ValueError so invalid version formats don't crash the run, logging the exception
details via self.logger.error; then change the info log emitted when the check
returns False to clearly distinguish an unsupported older but well-formed
version (e.g., "Version X is not supported; requires >= 4.12") from a malformed
version (which should be logged as an error with the exception). Reference
uses_konflux_imagestream_override and the surrounding logic in the
ScanOperator/scan flow to locate where to add the try/except and adjust the log
message.
🧹 Nitpick comments (2)
pyartcd/pyartcd/pipelines/scheduled/schedule_scan_operator.py (1)
32-35: Avoid blocking the event loop when triggering Jenkins builds.
jenkins.start_scan_operatoris synchronous; calling it in an async coroutine can block other scheduled work. Consider offloading to a thread.Suggested change
- jenkins.start_scan_operator(version=version, block_until_building=False) + await asyncio.to_thread( + jenkins.start_scan_operator, + version=version, + block_until_building=False, + )pyartcd/pyartcd/pipelines/scan_operator.py (1)
255-284: Run Jenkins triggers off-thread and keep stack traces.
The Jenkins calls are synchronous and can block the event loop; also uselogger.exceptionto preserve stack traces when catching broad exceptions.Suggested change
- try: - self.logger.info(f'Triggering bundle builds for {len(nvrs)} operators') - jenkins.start_olm_bundle_konflux( - build_version=self.version, - assembly=self.assembly, - operator_nvrs=nvrs, - group=self.group, - ) - except Exception as e: - self.logger.error(f'Failed to trigger bundle builds: {e}') + try: + self.logger.info(f'Triggering bundle builds for {len(nvrs)} operators') + await asyncio.to_thread( + jenkins.start_olm_bundle_konflux, + build_version=self.version, + assembly=self.assembly, + operator_nvrs=nvrs, + group=self.group, + ) + except Exception: + self.logger.exception('Failed to trigger bundle builds') @@ - try: - self.logger.info(f'Triggering FBC builds for {len(nvrs)} operators') - jenkins.start_build_fbc( - version=self.version, - assembly=self.assembly, - operator_nvrs=nvrs, - dry_run=False, - group=self.group, - ) - except Exception as e: - self.logger.error(f'Failed to trigger FBC builds: {e}') + try: + self.logger.info(f'Triggering FBC builds for {len(nvrs)} operators') + await asyncio.to_thread( + jenkins.start_build_fbc, + version=self.version, + assembly=self.assembly, + operator_nvrs=nvrs, + dry_run=False, + group=self.group, + ) + except Exception: + self.logger.exception('Failed to trigger FBC builds')
This pipeline will run periodically and scan for missing bundle and FBC builds and trigger them if necessary
Also, make logging msgs more useful
4fc682b to
93e45c3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pyartcd/pyartcd/pipelines/scan_operator.py`:
- Around line 245-283: The current try/except blocks in trigger_bundle_builds
and trigger_fbc_builds swallow all exceptions from
jenkins.start_olm_bundle_konflux and jenkins.start_build_fbc, hiding trigger
failures; change each handler to log the error and then re-raise the exception
(or remove the try/except entirely) so failures propagate and the scan job fails
when builds cannot be triggered, e.g., in trigger_bundle_builds (and
trigger_fbc_builds) log the failure with self.logger.error(...) and then raise
the caught exception.
fb11c53 to
7ae15db
Compare
|
@adobes1: This pull request references ART-14334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
744625f to
8d42329
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pyartcd/pyartcd/pipelines/scan_operator.py`:
- Around line 72-121: The run method currently logs exceptions returned from
check_operator but doesn't fail the pipeline; update run to aggregate any
exceptions from results (from asyncio.gather of self.check_operator for
operator_builds) into the existing trigger_errors list (or a new scan_errors
list) and raise at the end if any exist, so failures in check_operator cause the
job to fail; specifically, after the for-loop that inspects results, append any
Exception instances (including their operator id like operator_builds[i].nvr) to
trigger_errors (or merge scan_errors into trigger_errors) and let the final
RuntimeError include both trigger and scan failures before exiting.
| async def run(self): | ||
| """Main pipeline execution.""" | ||
| self.skipped = False | ||
| self.logger.info(f'Scanning version {self.version}, assembly {self.assembly}') | ||
|
|
||
| # Check if it's a valid version | ||
| if not uses_konflux_imagestream_override(self.version): | ||
| self.logger.info(f'Version {self.version} is not a valid version') | ||
| return | ||
|
|
||
| self.check_params() | ||
|
|
||
| operator_names = await self.load_operator_names() | ||
| self.logger.info(f'Found {len(operator_names)} operators in group {self.group}') | ||
|
|
||
| operator_builds = await self.get_latest_operator_builds(operator_names) | ||
| self.logger.info(f'Found {len(operator_builds)} operator builds with latest successful builds') | ||
|
|
||
| if not operator_builds: | ||
| self.logger.info('No operator builds found') | ||
| return | ||
|
|
||
| # Check if each operator has an associated bundle and fbc | ||
| results = await asyncio.gather(*[self.check_operator(op) for op in operator_builds], return_exceptions=True) | ||
|
|
||
| for i, result in enumerate(results): | ||
| if isinstance(result, Exception): | ||
| self.logger.error(f'Failed to process {operator_builds[i].nvr}: {result}') | ||
|
|
||
| # Track failures to trigger bundle/fbc jobs | ||
| trigger_errors = [] | ||
|
|
||
| if self.operators_without_bundles: | ||
| try: | ||
| self.trigger_bundle_builds(self.operators_without_bundles) | ||
| except Exception as e: | ||
| trigger_errors.append(e) | ||
| self.logger.error(f'Failed to trigger bundle builds: {e}') | ||
|
|
||
| if self.operators_without_fbcs: | ||
| try: | ||
| self.trigger_fbc_builds(self.operators_without_fbcs) | ||
| except Exception as e: | ||
| trigger_errors.append(e) | ||
| self.logger.error(f'Failed to trigger FBC builds: {e}') | ||
|
|
||
| # Raise at the end to mark job as failed | ||
| if trigger_errors: | ||
| raise RuntimeError(f"Failed to trigger bundle and/or FBC builds: {', '.join(map(str, trigger_errors))}") | ||
|
|
There was a problem hiding this comment.
Fail the run when operator checks error out. Exceptions from check_operator are logged but do not currently fail the job, which can silently miss required triggers. Consider aggregating these errors and failing at the end alongside trigger failures.
💡 Proposed fix (aggregate scan errors)
- results = await asyncio.gather(*[self.check_operator(op) for op in operator_builds], return_exceptions=True)
-
- for i, result in enumerate(results):
- if isinstance(result, Exception):
- self.logger.error(f'Failed to process {operator_builds[i].nvr}: {result}')
-
- # Track failures to trigger bundle/fbc jobs
- trigger_errors = []
+ # Track failures from operator checks and triggers
+ trigger_errors = []
+ results = await asyncio.gather(*[self.check_operator(op) for op in operator_builds], return_exceptions=True)
+
+ for i, result in enumerate(results):
+ if isinstance(result, Exception):
+ trigger_errors.append(result)
+ self.logger.error(f'Failed to process {operator_builds[i].nvr}: {result}')
@@
- if trigger_errors:
- raise RuntimeError(f"Failed to trigger bundle and/or FBC builds: {', '.join(map(str, trigger_errors))}")
+ if trigger_errors:
+ raise RuntimeError(f"Failed to process operators and/or trigger builds: {', '.join(map(str, trigger_errors))}")🧰 Tools
🪛 Ruff (0.14.14)
[warning] 107-107: Do not catch blind exception: Exception
(BLE001)
[warning] 109-109: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
[warning] 114-114: Do not catch blind exception: Exception
(BLE001)
[warning] 116-116: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
[warning] 120-120: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@pyartcd/pyartcd/pipelines/scan_operator.py` around lines 72 - 121, The run
method currently logs exceptions returned from check_operator but doesn't fail
the pipeline; update run to aggregate any exceptions from results (from
asyncio.gather of self.check_operator for operator_builds) into the existing
trigger_errors list (or a new scan_errors list) and raise at the end if any
exist, so failures in check_operator cause the job to fail; specifically, after
the for-loop that inspects results, append any Exception instances (including
their operator id like operator_builds[i].nvr) to trigger_errors (or merge
scan_errors into trigger_errors) and let the final RuntimeError include both
trigger and scan failures before exiting.
8d42329 to
441b207
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pyartcd/pyartcd/pipelines/scan_operator.py`:
- Around line 311-312: The file fails CI due to formatting; run the code
formatter and reformat the scan_operator function and surrounding file with ruff
format (or your configured formatter) to satisfy linting. Specifically run `ruff
format` on the module containing the async def scan_operator(runtime: Runtime,
version: str, assembly: str, data_path: str, data_gitref: str)` so the CLI will
rewrite that signature and surrounding whitespace to match project style, then
stage and commit the resulting changes.
|
@adobes1: This pull request references ART-14334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
441b207 to
3518abe
Compare
|
@adobes1: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add a periodic job to scan for missing operator bundle/FBC builds and trigger them if missing.
Scheduler (
pyartcd/pipelines/scheduled/schedule_scan_operator.py):New Pipeline (
pyartcd/pipelines/scan_operator.py):Doozer Enhancement (
doozer/cli/olm_bundle.py):--output-formatflag toolm-bundle:list-olm-operatorscommandcomponent(default, backward compatible) anddistgit-keyformatsTest runs (with --dry-run):
Required for openshift-eng/aos-cd-jobs#4571