[ART-14343] add support for external-images in art.yaml#2413
[ART-14343] add support for external-images in art.yaml#2413Ximinhan wants to merge 7 commits intoopenshift-eng:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRestructures CSV update into a multi-step orchestration: resolve bundle/manifest paths, locate CSV and image-references, perform internal image ref replacements, apply art.yaml updates (with templating), and replace external image refs with digest-pinned pullspecs; Changes
Sequence Diagram(s)sequenceDiagram
participant Rebaser
participant RepoFS as Repository_FS
participant ArtYAML as art.yaml_Processor
participant Registry as Image_Registry
Rebaser->>RepoFS: _get_bundle_paths(csv_config) / _get_csv_file_and_refs(...)
RepoFS-->>Rebaser: csv_file path, image_refs, bundle/manifests dirs
Rebaser->>RepoFS: _replace_internal_image_refs(csv_file, image_refs, registry, image_map, external_names)
RepoFS-->>Rebaser: CSV + image_refs with internal refs updated (in-memory)
Rebaser->>ArtYAML: _apply_art_yaml_updates(metadata, dest_dir, csv_config, version, release)
ArtYAML-->>Rebaser: art_yaml_data (includes external-images)
Rebaser->>Registry: _replace_external_image_refs(csv_file, csv_config, dest_dir, art_yaml_data)
Registry-->>Rebaser: resolved digest-pinned external pullspecs
Rebaser->>RepoFS: write updated CSV and image-references (single write)
RepoFS-->>Rebaser: write confirmation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@doozer/doozerlib/backend/rebaser.py`:
- Around line 2380-2487: In _replace_external_image_refs, replace the call to
util.oc_image_info_for_arch_async__caching(target) with
util.oc_image_info_show_multiarch_async__caching(target) so you retrieve
manifest-list metadata (use the same async/await pattern), then extract the
manifest-list digest from image_info['listDigest'] and fall back to
image_info.get('contentDigest') or image_info.get('digest') if listDigest is
missing; use that digest when building resolved_target and keep the rest of the
CSV/ref replacement logic unchanged (references: function name
_replace_external_image_refs, variables target, image_info, digest,
resolved_target, refs_path).
| @start_as_current_span_async(TRACER, "rebase.replace_external_image_refs") | ||
| async def _replace_external_image_refs( | ||
| self, | ||
| csv_file: str, | ||
| csv_config: dict, | ||
| dest_dir: Path, | ||
| art_yaml_data: Optional[dict], | ||
| ) -> None: | ||
| """Replace external image references with digest-pinned pullspecs. | ||
|
|
||
| This function processes the 'external-images' section from art.yaml to resolve | ||
| external (non-ART-built) container images to their digest-pinned form. For each | ||
| external image, it: | ||
| 1. Resolves the target image to get its SHA256 digest | ||
| 2. Strips any tag from the target to get the base image reference | ||
| 3. Constructs the digest-pinned pullspec (e.g., registry.redhat.io/rhel9/postgresql-15@sha256:...) | ||
| 4. Replaces the source placeholder with the resolved pullspec in CSV and image-references files | ||
|
|
||
| The 'external-images' configuration in art.yaml should have entries like: | ||
| external-images: | ||
| - name: postgresql | ||
| source: RELATED_IMAGE_POSTGRESQL # placeholder in CSV | ||
| target: registry.redhat.io/rhel9/postgresql-15:latest # image to resolve | ||
|
|
||
| Args: | ||
| csv_file: Path to the ClusterServiceVersion YAML file. | ||
| csv_config: The 'update-csv' configuration from the image metadata. | ||
| dest_dir: The destination directory containing the operator bundle. | ||
| art_yaml_data: Parsed art.yaml data from _apply_art_yaml_updates, or None | ||
| if art.yaml doesn't exist. This avoids duplicate file I/O. | ||
| """ | ||
| if not art_yaml_data: | ||
| return | ||
|
|
||
| external_images = art_yaml_data.get('external-images', []) | ||
| if not external_images: | ||
| return | ||
|
|
||
| refs_path = self._find_image_refs_path(dest_dir, csv_config) | ||
|
|
||
| for ext_img in external_images: | ||
| name = ext_img.get('name') | ||
| source = ext_img.get('source') | ||
| target = ext_img.get('target') | ||
|
|
||
| if not source or not target: | ||
| self._logger.warning(f"External image configuration for {name} is missing source or target") | ||
| continue | ||
|
|
||
| self._logger.info(f"Resolving digest for external image {target}") | ||
| try: | ||
| image_info = await util.oc_image_info_for_arch_async__caching(target) | ||
| digest = image_info.get('digest') | ||
| except Exception as e: | ||
| self._logger.error(f"Failed to resolve digest for {target}: {e}") | ||
| raise | ||
|
|
||
| if not digest: | ||
| raise IOError(f"Could not resolve digest for {target}") | ||
|
|
||
| # Strip the tag from target to get the base image reference | ||
| # e.g., registry.redhat.io/rhel9/postgresql-15:latest -> registry.redhat.io/rhel9/postgresql-15 | ||
| # Handle edge cases like registry.example.com:5000/image:tag (port vs tag) | ||
| if '@' in target: | ||
| # Already has a digest - strip it to get the base | ||
| base_target = target.split('@')[0] | ||
| else: | ||
| # Check for tag: split by '/' and check if last segment has ':' | ||
| # This handles registry:port/image:tag correctly | ||
| parts = target.rsplit('/', 1) | ||
| if len(parts) == 2 and ':' in parts[1]: | ||
| # The last segment has a tag, strip it | ||
| base_target = parts[0] + '/' + parts[1].rsplit(':', 1)[0] | ||
| elif len(parts) == 1 and ':' in parts[0]: | ||
| # No '/' in target, check if it's just image:tag | ||
| # This is less common but handle it | ||
| base_target = parts[0].rsplit(':', 1)[0] | ||
| else: | ||
| # No tag | ||
| base_target = target | ||
|
|
||
| resolved_target = f"{base_target}@{digest}" | ||
|
|
||
| self._logger.info(f"Replacing {source} with {resolved_target} in CSV and {refs_path}") | ||
|
|
||
| # Update CSV | ||
| async with aiofiles.open(csv_file, 'r', encoding='utf-8') as f: | ||
| content = await f.read() | ||
|
|
||
| if source in content: | ||
| content = content.replace(source, resolved_target) | ||
| async with aiofiles.open(csv_file, 'w', encoding='utf-8') as f: | ||
| await f.write(content) | ||
| else: | ||
| self._logger.info(f"Source image {source} not found in CSV {csv_file}, nothing to replace") | ||
|
|
||
| # Update image-references | ||
| if refs_path: | ||
| async with aiofiles.open(refs_path, 'r', encoding='utf-8') as f: | ||
| content = await f.read() | ||
|
|
||
| if source in content: | ||
| content = content.replace(source, resolved_target) | ||
| async with aiofiles.open(refs_path, 'w', encoding='utf-8') as f: | ||
| await f.write(content) | ||
| else: | ||
| self._logger.info(f"Source image {source} not found in {refs_path}, nothing to replace") | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find oc_image_info_for_arch_async__caching definition and related digest fields
rg -n "def oc_image_info_for_arch_async__caching" -A 30 doozer/doozerlib/util.pyRepository: openshift-eng/art-tools
Length of output: 1510
🏁 Script executed:
#!/bin/bash
# Find oc_image_info_for_arch_async definition to see digest field handling
rg -n "def oc_image_info_for_arch_async" -A 50 doozer/doozerlib/util.py | head -80Repository: openshift-eng/art-tools
Length of output: 2914
🏁 Script executed:
#!/bin/bash
# Find oc_image_info_async definition to see what digest fields are returned
rg -n "def oc_image_info_async" -A 30 doozer/doozerlib/util.pyRepository: openshift-eng/art-tools
Length of output: 1218
🏁 Script executed:
#!/bin/bash
# Search for listDigest usage in the codebase
rg -n "listDigest" doozer/doozerlib/
# Also check how digest is used in related image handling
rg -n "image_info.*digest|digest.*image_info" -A 2 -B 2 doozer/doozerlib/ | head -60Repository: openshift-eng/art-tools
Length of output: 1581
🏁 Script executed:
#!/bin/bash
# Check how external images / manifest lists are handled in bundle.py and konflux_olm_bundler.py
rg -n "listDigest" -B 10 -A 5 doozer/doozerlib/olm/bundle.py | head -40
rg -n "listDigest" -B 10 -A 5 doozer/doozerlib/backend/konflux_olm_bundler.py | head -40Repository: openshift-eng/art-tools
Length of output: 1728
🏁 Script executed:
#!/bin/bash
# Check what oc image info returns with different flags
# Look for documentation on digest vs contentDigest vs listDigest
rg -n "contentDigest|listDigest|\.digest" doozer/doozerlib/util.py -B 2 -A 2
# Check usage of oc_image_info_show_multiarch_async (the multi-arch version)
rg -n "oc_image_info_show_multiarch_async" -B 3 -A 8 doozer/doozerlib/Repository: openshift-eng/art-tools
Length of output: 2487
🏁 Script executed:
#!/bin/bash
# Check what oc_image_info_show_multiarch_async returns and if it has listDigest
# Also check oc_image_info_for_arch__caching (sync version, not async)
rg -n "def oc_image_info_for_arch__caching" -A 15 doozer/doozerlib/util.py
# Check what digest fields the multiarch version would return
rg -n "show-multiarch" -B 3 -A 10 doozer/doozerlib/util.pyRepository: openshift-eng/art-tools
Length of output: 4933
🏁 Script executed:
#!/bin/bash
# Check the actual output structure of oc image info with --filter-by-os
# Look for any documentation or tests that show what fields are returned
rg -n "digest|listDigest|contentDigest" doozer/doozerlib/backend/rebaser.py | head -20
# Also check if there are any tests or examples of external image handling
rg -n "external.image|external-image" -B 5 -A 5 doozer/doozerlib/backend/rebaser.py | head -40Repository: openshift-eng/art-tools
Length of output: 3644
🏁 Script executed:
#!/bin/bash
# Verify what oc image info --filter-by-os returns vs --show-multiarch
# Check how digest vs listDigest vs contentDigest are documented
rg -n "contentDigest|listDigest" doozer/doozerlib/util.py -B 5 -A 5
# Also look for patterns where external images or multi-arch images are handled
rg -n "show.multiarch|filter.by.os" doozer/doozerlib/ -B 3 -A 3Repository: openshift-eng/art-tools
Length of output: 17739
🏁 Script executed:
#!/bin/bash
# Check what fields oc image info returns - look for actual usage/tests
# Search for examples showing the structure of image_info
rg -n "image_info\[" doozer/doozerlib/ | grep -E "digest|Digest" | head -20
# Also check for any tests that show the expected structure
fd -e "test*.py" -e "*test.py" doozer/ | head -5Repository: openshift-eng/art-tools
Length of output: 855
🏁 Script executed:
#!/bin/bash
# Verify: when using --filter-by-os, does oc image info return listDigest?
# Check what fields are actually available with arch filtering
rg -n "def oc_image_info_for_arch" -A 10 doozer/doozerlib/util.py | head -30
# Also verify what oc_image_info_show_multiarch returns for single-arch images
rg -n "oc_image_info_show_multiarch" -B 2 -A 10 doozer/doozerlib/build_info.py | head -40Repository: openshift-eng/art-tools
Length of output: 2207
🏁 Script executed:
#!/bin/bash
# Key question: what does oc image info return with --filter-by-os?
# Does it include listDigest or only digest/contentDigest?
# Let's check the actual oc command behavior by looking at how results are processed
rg -n "get\('digest'\)|get\('listDigest'\)|get\('contentDigest'\)" doozer/doozerlib/backend/rebaser.py -B 3 -A 3
# Also check if there are other places handling external images
rg -n "external.image|external-image" doozer/doozerlib/ -B 2 -A 2Repository: openshift-eng/art-tools
Length of output: 6063
🏁 Script executed:
#!/bin/bash
# Critical verification: when oc image info uses --filter-by-os, what digest fields are in the output?
# Check if listDigest appears even with arch filtering
# Also check how the codebase distinguishes between these approaches
# Look for what fields external images context expects
rg -n "contentDigest|listDigest" -B 5 -A 5 doozer/doozerlib/util.py | grep -A 10 "find_manifest_list_sha"
# Check the actual logic in bundle.py to understand the pattern better
rg -n "operator_image_ref_mode" -B 5 -A 10 doozer/doozerlib/olm/bundle.pyRepository: openshift-eng/art-tools
Length of output: 1160
🏁 Script executed:
#!/bin/bash
# Final verification: what field names does oc image info actually return?
# Check if 'digest' is a valid field or if we should use 'contentDigest'
rg -n "\.digest|contentDigest|listDigest" doozer/doozerlib/build_info.py -B 2 -A 2 | head -40
# Also check if there are any comments/docs explaining these fields
rg -n "digest" doozer/doozerlib/util.py | grep -i "content\|list\|field" | head -10Repository: openshift-eng/art-tools
Length of output: 292
🏁 Script executed:
#!/bin/bash
# Check what field name oc image info actually returns
# Look for how image_info fields are accessed in the codebase
rg -n "image_info\['digest" doozer/doozerlib/ -B 2 -A 2
rg -n "image_info\['contentDigest" doozer/doozerlib/ -B 2 -A 2
# Check if 'digest' vs 'contentDigest' is the actual issue
git log --oneline -S "contentDigest" -- doozer/doozerlib/ | head -5Repository: openshift-eng/art-tools
Length of output: 1446
🏁 Script executed:
#!/bin/bash
# Check the git commit that added external-images to understand the intent
git show 28c5471b --stat | head -40
# Also check if there's a reason for using oc_image_info_for_arch_async__caching vs show_multiarch
rg -n "oc_image_info_show_multiarch_async" doozer/doozerlib/backend/rebaser.pyRepository: openshift-eng/art-tools
Length of output: 2328
Use manifest-list digest for external images to preserve multi-arch compatibility.
If external images are multi-arch manifest lists, using oc_image_info_for_arch_async__caching (which filters to a single architecture via --filter-by-os) pins the digest to a specific arch rather than the manifest list. This breaks pulls on other architectures.
However, the suggested fix has an issue: when using --filter-by-os, the response does not include listDigest (see util.find_manifest_list_sha which explicitly checks for this field). The actual field returned for arch-filtered results appears to be contentDigest, not digest.
The correct approach is to call oc_image_info_show_multiarch_async__caching instead, which returns the full manifest-list metadata including listDigest, and extract that field.
🧰 Tools
🪛 Ruff (0.14.14)
2434-2434: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
2438-2438: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@doozer/doozerlib/backend/rebaser.py` around lines 2380 - 2487, In
_replace_external_image_refs, replace the call to
util.oc_image_info_for_arch_async__caching(target) with
util.oc_image_info_show_multiarch_async__caching(target) so you retrieve
manifest-list metadata (use the same async/await pattern), then extract the
manifest-list digest from image_info['listDigest'] and fall back to
image_info.get('contentDigest') or image_info.get('digest') if listDigest is
missing; use that digest when building resolved_target and keep the rest of the
CSV/ref replacement logic unchanged (references: function name
_replace_external_image_refs, variables target, image_info, digest,
resolved_target, refs_path).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@doozer/doozerlib/backend/rebaser.py`:
- Around line 2439-2451: The code is using ext_img.get('search') and
ext_img.get('replace') which mismatches the docstring; update the loop in
rebaser.py to read ext_img.get('source') into the variable source and
ext_img.get('target') into target (keep the existing validation, logging via
self._logger, and subsequent call to
util.oc_image_info_for_arch_async__caching(target) unchanged) so field names
match the documented API.
🧹 Nitpick comments (2)
doozer/doozerlib/backend/rebaser.py (2)
2354-2355: Chain the exception for better error traceability.When re-raising with a different exception type, use
raise ... from exto preserve the original exception context. This helps with debugging by showing the full exception chain.♻️ Suggested improvement
except Exception as ex: # exception is low level, need to pull out the details and rethrow - raise IOError('Error processing art.yaml!\n{}\n\n{}'.format(str(ex), art_yaml_str)) + raise IOError('Error processing art.yaml!\n{}\n\n{}'.format(str(ex), art_yaml_str)) from ex
2526-2530: Consider logging when art.yaml parsing fails.Silently catching all exceptions and returning an empty set could hide configuration issues. While
_apply_art_yaml_updateswill catch errors later, logging here would help with debugging if external images are unexpectedly not being processed.🔧 Suggested improvement
try: with io.open(art_yaml_path, 'r', encoding="utf-8") as art_file: art_yaml_data = yaml.full_load(art_file.read()) - except Exception: + except Exception as e: + self._logger.debug(f"Could not parse art.yaml for external-images: {e}") return set()
doozer/doozerlib/backend/rebaser.py
Outdated
| for ext_img in external_images: | ||
| name = ext_img.get('name') | ||
| source = ext_img.get('search') | ||
| target = ext_img.get('replace') | ||
|
|
||
| if not source or not target: | ||
| self._logger.warning(f"External image configuration for {name} is missing search or replace") | ||
| continue | ||
|
|
||
| self._logger.info(f"Resolving digest for external image {target}") | ||
| try: | ||
| image_info = await util.oc_image_info_for_arch_async__caching(target) | ||
| digest = image_info.get('digest') |
There was a problem hiding this comment.
Field name mismatch between docstring and implementation.
The docstring describes source and target fields, but the code accesses search and replace:
source = ext_img.get('search') # Docstring says 'source'
target = ext_img.get('replace') # Docstring says 'target'Either update the docstring to match the actual field names (search/replace), or update the code to use the documented field names (source/target).
🔧 Option 1: Update docstring to match code
The 'external-images' configuration in art.yaml should have entries like:
external-images:
- name: postgresql
- source: RELATED_IMAGE_POSTGRESQL # placeholder in CSV
- target: registry.redhat.io/rhel9/postgresql-15:latest # image to resolve
+ search: RELATED_IMAGE_POSTGRESQL # placeholder in CSV
+ replace: registry.redhat.io/rhel9/postgresql-15:latest # image to resolve🤖 Prompt for AI Agents
In `@doozer/doozerlib/backend/rebaser.py` around lines 2439 - 2451, The code is
using ext_img.get('search') and ext_img.get('replace') which mismatches the
docstring; update the loop in rebaser.py to read ext_img.get('source') into the
variable source and ext_img.get('target') into target (keep the existing
validation, logging via self._logger, and subsequent call to
util.oc_image_info_for_arch_async__caching(target) unchanged) so field names
match the documented API.
|
@Ximinhan: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@Ximinhan I would fix the Docstring and examples which use incorrect field names source: and target: , should really be search: and replace: to avoid confusion between implementation and docstring |
This PR adds support for
external-imagesconfiguration inart.yamlfor operator bundles, enabling digest-pinned replacement of external (non-ART-built) container image references in CSV files.Key changes:
_replace_external_image_refs()function to resolve external images to their SHA256 digest and replace placeholders in CSV and image-references files_update_csv()to separate internal vs external image replacement logic_get_bundle_paths()and_find_image_refs_path()for better code organization_replace_internal_image_refs()to collect all replacements first and perform a single read-modify-write cycle (optimization)_apply_art_yaml_updates()return parsed art.yaml data to avoid duplicate file I/O when processing external-imagesNew art.yaml configuration format: