Skip to content

[ART-14343] add support for external-images in art.yaml#2413

Open
Ximinhan wants to merge 7 commits intoopenshift-eng:mainfrom
Ximinhan:art-14343
Open

[ART-14343] add support for external-images in art.yaml#2413
Ximinhan wants to merge 7 commits intoopenshift-eng:mainfrom
Ximinhan:art-14343

Conversation

@Ximinhan
Copy link
Contributor

@Ximinhan Ximinhan commented Jan 27, 2026

This PR adds support for external-images configuration in art.yaml for operator bundles, enabling digest-pinned replacement of external (non-ART-built) container image references in CSV files.

Key changes:

  • Add _replace_external_image_refs() function to resolve external images to their SHA256 digest and replace placeholders in CSV and image-references files
  • Refactor _update_csv() to separate internal vs external image replacement logic
  • Extract helper methods _get_bundle_paths() and _find_image_refs_path() for better code organization
  • Refactor _replace_internal_image_refs() to collect all replacements first and perform a single read-modify-write cycle (optimization)
  • Have _apply_art_yaml_updates() return parsed art.yaml data to avoid duplicate file I/O when processing external-images

New art.yaml configuration format:

external-images:
  - name: postgresql
    search: RELATED_IMAGE_POSTGRESQL  # placeholder in CSV to replace
    replace: registry.redhat.io/rhel9/postgresql-15:latest  # image to resolve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sosiouxme for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Restructures CSV update into a multi-step orchestration: resolve bundle/manifest paths, locate CSV and image-references, perform internal image ref replacements, apply art.yaml updates (with templating), and replace external image refs with digest-pinned pullspecs; _update_csv now sequences these steps.

Changes

Cohort / File(s) Summary
Rebaser CSV/image update logic
doozer/doozerlib/backend/rebaser.py
Rewrote _update_csv to orchestrate a multi-step update flow. Added helpers: _get_bundle_paths, _find_image_refs_path, _get_csv_file_and_refs, _replace_internal_image_refs, _apply_art_yaml_updates, _replace_external_image_refs, _get_external_image_names. Internal replacements skip images declared as external; art.yaml updates support templating and return parsed data for later steps. Typings extended (added Set).

Sequence Diagram(s)

sequenceDiagram
    participant Rebaser
    participant RepoFS as Repository_FS
    participant ArtYAML as art.yaml_Processor
    participant Registry as Image_Registry

    Rebaser->>RepoFS: _get_bundle_paths(csv_config) / _get_csv_file_and_refs(...)
    RepoFS-->>Rebaser: csv_file path, image_refs, bundle/manifests dirs
    Rebaser->>RepoFS: _replace_internal_image_refs(csv_file, image_refs, registry, image_map, external_names)
    RepoFS-->>Rebaser: CSV + image_refs with internal refs updated (in-memory)
    Rebaser->>ArtYAML: _apply_art_yaml_updates(metadata, dest_dir, csv_config, version, release)
    ArtYAML-->>Rebaser: art_yaml_data (includes external-images)
    Rebaser->>Registry: _replace_external_image_refs(csv_file, csv_config, dest_dir, art_yaml_data)
    Registry-->>Rebaser: resolved digest-pinned external pullspecs
    Rebaser->>RepoFS: write updated CSV and image-references (single write)
    RepoFS-->>Rebaser: write confirmation
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@doozer/doozerlib/backend/rebaser.py`:
- Around line 2380-2487: In _replace_external_image_refs, replace the call to
util.oc_image_info_for_arch_async__caching(target) with
util.oc_image_info_show_multiarch_async__caching(target) so you retrieve
manifest-list metadata (use the same async/await pattern), then extract the
manifest-list digest from image_info['listDigest'] and fall back to
image_info.get('contentDigest') or image_info.get('digest') if listDigest is
missing; use that digest when building resolved_target and keep the rest of the
CSV/ref replacement logic unchanged (references: function name
_replace_external_image_refs, variables target, image_info, digest,
resolved_target, refs_path).

Comment on lines 2380 to 2487
@start_as_current_span_async(TRACER, "rebase.replace_external_image_refs")
async def _replace_external_image_refs(
self,
csv_file: str,
csv_config: dict,
dest_dir: Path,
art_yaml_data: Optional[dict],
) -> None:
"""Replace external image references with digest-pinned pullspecs.

This function processes the 'external-images' section from art.yaml to resolve
external (non-ART-built) container images to their digest-pinned form. For each
external image, it:
1. Resolves the target image to get its SHA256 digest
2. Strips any tag from the target to get the base image reference
3. Constructs the digest-pinned pullspec (e.g., registry.redhat.io/rhel9/postgresql-15@sha256:...)
4. Replaces the source placeholder with the resolved pullspec in CSV and image-references files

The 'external-images' configuration in art.yaml should have entries like:
external-images:
- name: postgresql
source: RELATED_IMAGE_POSTGRESQL # placeholder in CSV
target: registry.redhat.io/rhel9/postgresql-15:latest # image to resolve

Args:
csv_file: Path to the ClusterServiceVersion YAML file.
csv_config: The 'update-csv' configuration from the image metadata.
dest_dir: The destination directory containing the operator bundle.
art_yaml_data: Parsed art.yaml data from _apply_art_yaml_updates, or None
if art.yaml doesn't exist. This avoids duplicate file I/O.
"""
if not art_yaml_data:
return

external_images = art_yaml_data.get('external-images', [])
if not external_images:
return

refs_path = self._find_image_refs_path(dest_dir, csv_config)

for ext_img in external_images:
name = ext_img.get('name')
source = ext_img.get('source')
target = ext_img.get('target')

if not source or not target:
self._logger.warning(f"External image configuration for {name} is missing source or target")
continue

self._logger.info(f"Resolving digest for external image {target}")
try:
image_info = await util.oc_image_info_for_arch_async__caching(target)
digest = image_info.get('digest')
except Exception as e:
self._logger.error(f"Failed to resolve digest for {target}: {e}")
raise

if not digest:
raise IOError(f"Could not resolve digest for {target}")

# Strip the tag from target to get the base image reference
# e.g., registry.redhat.io/rhel9/postgresql-15:latest -> registry.redhat.io/rhel9/postgresql-15
# Handle edge cases like registry.example.com:5000/image:tag (port vs tag)
if '@' in target:
# Already has a digest - strip it to get the base
base_target = target.split('@')[0]
else:
# Check for tag: split by '/' and check if last segment has ':'
# This handles registry:port/image:tag correctly
parts = target.rsplit('/', 1)
if len(parts) == 2 and ':' in parts[1]:
# The last segment has a tag, strip it
base_target = parts[0] + '/' + parts[1].rsplit(':', 1)[0]
elif len(parts) == 1 and ':' in parts[0]:
# No '/' in target, check if it's just image:tag
# This is less common but handle it
base_target = parts[0].rsplit(':', 1)[0]
else:
# No tag
base_target = target

resolved_target = f"{base_target}@{digest}"

self._logger.info(f"Replacing {source} with {resolved_target} in CSV and {refs_path}")

# Update CSV
async with aiofiles.open(csv_file, 'r', encoding='utf-8') as f:
content = await f.read()

if source in content:
content = content.replace(source, resolved_target)
async with aiofiles.open(csv_file, 'w', encoding='utf-8') as f:
await f.write(content)
else:
self._logger.info(f"Source image {source} not found in CSV {csv_file}, nothing to replace")

# Update image-references
if refs_path:
async with aiofiles.open(refs_path, 'r', encoding='utf-8') as f:
content = await f.read()

if source in content:
content = content.replace(source, resolved_target)
async with aiofiles.open(refs_path, 'w', encoding='utf-8') as f:
await f.write(content)
else:
self._logger.info(f"Source image {source} not found in {refs_path}, nothing to replace")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find oc_image_info_for_arch_async__caching definition and related digest fields
rg -n "def oc_image_info_for_arch_async__caching" -A 30 doozer/doozerlib/util.py

Repository: openshift-eng/art-tools

Length of output: 1510


🏁 Script executed:

#!/bin/bash
# Find oc_image_info_for_arch_async definition to see digest field handling
rg -n "def oc_image_info_for_arch_async" -A 50 doozer/doozerlib/util.py | head -80

Repository: openshift-eng/art-tools

Length of output: 2914


🏁 Script executed:

#!/bin/bash
# Find oc_image_info_async definition to see what digest fields are returned
rg -n "def oc_image_info_async" -A 30 doozer/doozerlib/util.py

Repository: openshift-eng/art-tools

Length of output: 1218


🏁 Script executed:

#!/bin/bash
# Search for listDigest usage in the codebase
rg -n "listDigest" doozer/doozerlib/
# Also check how digest is used in related image handling
rg -n "image_info.*digest|digest.*image_info" -A 2 -B 2 doozer/doozerlib/ | head -60

Repository: openshift-eng/art-tools

Length of output: 1581


🏁 Script executed:

#!/bin/bash
# Check how external images / manifest lists are handled in bundle.py and konflux_olm_bundler.py
rg -n "listDigest" -B 10 -A 5 doozer/doozerlib/olm/bundle.py | head -40
rg -n "listDigest" -B 10 -A 5 doozer/doozerlib/backend/konflux_olm_bundler.py | head -40

Repository: openshift-eng/art-tools

Length of output: 1728


🏁 Script executed:

#!/bin/bash
# Check what oc image info returns with different flags
# Look for documentation on digest vs contentDigest vs listDigest
rg -n "contentDigest|listDigest|\.digest" doozer/doozerlib/util.py -B 2 -A 2
# Check usage of oc_image_info_show_multiarch_async (the multi-arch version)
rg -n "oc_image_info_show_multiarch_async" -B 3 -A 8 doozer/doozerlib/

Repository: openshift-eng/art-tools

Length of output: 2487


🏁 Script executed:

#!/bin/bash
# Check what oc_image_info_show_multiarch_async returns and if it has listDigest
# Also check oc_image_info_for_arch__caching (sync version, not async)
rg -n "def oc_image_info_for_arch__caching" -A 15 doozer/doozerlib/util.py
# Check what digest fields the multiarch version would return
rg -n "show-multiarch" -B 3 -A 10 doozer/doozerlib/util.py

Repository: openshift-eng/art-tools

Length of output: 4933


🏁 Script executed:

#!/bin/bash
# Check the actual output structure of oc image info with --filter-by-os
# Look for any documentation or tests that show what fields are returned
rg -n "digest|listDigest|contentDigest" doozer/doozerlib/backend/rebaser.py | head -20
# Also check if there are any tests or examples of external image handling
rg -n "external.image|external-image" -B 5 -A 5 doozer/doozerlib/backend/rebaser.py | head -40

Repository: openshift-eng/art-tools

Length of output: 3644


🏁 Script executed:

#!/bin/bash
# Verify what oc image info --filter-by-os returns vs --show-multiarch
# Check how digest vs listDigest vs contentDigest are documented
rg -n "contentDigest|listDigest" doozer/doozerlib/util.py -B 5 -A 5
# Also look for patterns where external images or multi-arch images are handled
rg -n "show.multiarch|filter.by.os" doozer/doozerlib/ -B 3 -A 3

Repository: openshift-eng/art-tools

Length of output: 17739


🏁 Script executed:

#!/bin/bash
# Check what fields oc image info returns - look for actual usage/tests
# Search for examples showing the structure of image_info
rg -n "image_info\[" doozer/doozerlib/ | grep -E "digest|Digest" | head -20
# Also check for any tests that show the expected structure
fd -e "test*.py" -e "*test.py" doozer/ | head -5

Repository: openshift-eng/art-tools

Length of output: 855


🏁 Script executed:

#!/bin/bash
# Verify: when using --filter-by-os, does oc image info return listDigest?
# Check what fields are actually available with arch filtering
rg -n "def oc_image_info_for_arch" -A 10 doozer/doozerlib/util.py | head -30
# Also verify what oc_image_info_show_multiarch returns for single-arch images
rg -n "oc_image_info_show_multiarch" -B 2 -A 10 doozer/doozerlib/build_info.py | head -40

Repository: openshift-eng/art-tools

Length of output: 2207


🏁 Script executed:

#!/bin/bash
# Key question: what does oc image info return with --filter-by-os?
# Does it include listDigest or only digest/contentDigest?
# Let's check the actual oc command behavior by looking at how results are processed
rg -n "get\('digest'\)|get\('listDigest'\)|get\('contentDigest'\)" doozer/doozerlib/backend/rebaser.py -B 3 -A 3
# Also check if there are other places handling external images
rg -n "external.image|external-image" doozer/doozerlib/ -B 2 -A 2

Repository: openshift-eng/art-tools

Length of output: 6063


🏁 Script executed:

#!/bin/bash
# Critical verification: when oc image info uses --filter-by-os, what digest fields are in the output?
# Check if listDigest appears even with arch filtering
# Also check how the codebase distinguishes between these approaches

# Look for what fields external images context expects
rg -n "contentDigest|listDigest" -B 5 -A 5 doozer/doozerlib/util.py | grep -A 10 "find_manifest_list_sha"

# Check the actual logic in bundle.py to understand the pattern better
rg -n "operator_image_ref_mode" -B 5 -A 10 doozer/doozerlib/olm/bundle.py

Repository: openshift-eng/art-tools

Length of output: 1160


🏁 Script executed:

#!/bin/bash
# Final verification: what field names does oc image info actually return?
# Check if 'digest' is a valid field or if we should use 'contentDigest'
rg -n "\.digest|contentDigest|listDigest" doozer/doozerlib/build_info.py -B 2 -A 2 | head -40
# Also check if there are any comments/docs explaining these fields
rg -n "digest" doozer/doozerlib/util.py | grep -i "content\|list\|field" | head -10

Repository: openshift-eng/art-tools

Length of output: 292


🏁 Script executed:

#!/bin/bash
# Check what field name oc image info actually returns
# Look for how image_info fields are accessed in the codebase
rg -n "image_info\['digest" doozer/doozerlib/ -B 2 -A 2
rg -n "image_info\['contentDigest" doozer/doozerlib/ -B 2 -A 2
# Check if 'digest' vs 'contentDigest' is the actual issue
git log --oneline -S "contentDigest" -- doozer/doozerlib/ | head -5

Repository: openshift-eng/art-tools

Length of output: 1446


🏁 Script executed:

#!/bin/bash
# Check the git commit that added external-images to understand the intent
git show 28c5471b --stat | head -40
# Also check if there's a reason for using oc_image_info_for_arch_async__caching vs show_multiarch
rg -n "oc_image_info_show_multiarch_async" doozer/doozerlib/backend/rebaser.py

Repository: openshift-eng/art-tools

Length of output: 2328


Use manifest-list digest for external images to preserve multi-arch compatibility.

If external images are multi-arch manifest lists, using oc_image_info_for_arch_async__caching (which filters to a single architecture via --filter-by-os) pins the digest to a specific arch rather than the manifest list. This breaks pulls on other architectures.

However, the suggested fix has an issue: when using --filter-by-os, the response does not include listDigest (see util.find_manifest_list_sha which explicitly checks for this field). The actual field returned for arch-filtered results appears to be contentDigest, not digest.

The correct approach is to call oc_image_info_show_multiarch_async__caching instead, which returns the full manifest-list metadata including listDigest, and extract that field.

🧰 Tools
🪛 Ruff (0.14.14)

2434-2434: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


2438-2438: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@doozer/doozerlib/backend/rebaser.py` around lines 2380 - 2487, In
_replace_external_image_refs, replace the call to
util.oc_image_info_for_arch_async__caching(target) with
util.oc_image_info_show_multiarch_async__caching(target) so you retrieve
manifest-list metadata (use the same async/await pattern), then extract the
manifest-list digest from image_info['listDigest'] and fall back to
image_info.get('contentDigest') or image_info.get('digest') if listDigest is
missing; use that digest when building resolved_target and keep the rest of the
CSV/ref replacement logic unchanged (references: function name
_replace_external_image_refs, variables target, image_info, digest,
resolved_target, refs_path).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@doozer/doozerlib/backend/rebaser.py`:
- Around line 2439-2451: The code is using ext_img.get('search') and
ext_img.get('replace') which mismatches the docstring; update the loop in
rebaser.py to read ext_img.get('source') into the variable source and
ext_img.get('target') into target (keep the existing validation, logging via
self._logger, and subsequent call to
util.oc_image_info_for_arch_async__caching(target) unchanged) so field names
match the documented API.
🧹 Nitpick comments (2)
doozer/doozerlib/backend/rebaser.py (2)

2354-2355: Chain the exception for better error traceability.

When re-raising with a different exception type, use raise ... from ex to preserve the original exception context. This helps with debugging by showing the full exception chain.

♻️ Suggested improvement
         except Exception as ex:  # exception is low level, need to pull out the details and rethrow
-            raise IOError('Error processing art.yaml!\n{}\n\n{}'.format(str(ex), art_yaml_str))
+            raise IOError('Error processing art.yaml!\n{}\n\n{}'.format(str(ex), art_yaml_str)) from ex

2526-2530: Consider logging when art.yaml parsing fails.

Silently catching all exceptions and returning an empty set could hide configuration issues. While _apply_art_yaml_updates will catch errors later, logging here would help with debugging if external images are unexpectedly not being processed.

🔧 Suggested improvement
         try:
             with io.open(art_yaml_path, 'r', encoding="utf-8") as art_file:
                 art_yaml_data = yaml.full_load(art_file.read())
-        except Exception:
+        except Exception as e:
+            self._logger.debug(f"Could not parse art.yaml for external-images: {e}")
             return set()

Comment on lines 2439 to 2451
for ext_img in external_images:
name = ext_img.get('name')
source = ext_img.get('search')
target = ext_img.get('replace')

if not source or not target:
self._logger.warning(f"External image configuration for {name} is missing search or replace")
continue

self._logger.info(f"Resolving digest for external image {target}")
try:
image_info = await util.oc_image_info_for_arch_async__caching(target)
digest = image_info.get('digest')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Field name mismatch between docstring and implementation.

The docstring describes source and target fields, but the code accesses search and replace:

source = ext_img.get('search')   # Docstring says 'source'
target = ext_img.get('replace')  # Docstring says 'target'

Either update the docstring to match the actual field names (search/replace), or update the code to use the documented field names (source/target).

🔧 Option 1: Update docstring to match code
     The 'external-images' configuration in art.yaml should have entries like:
         external-images:
           - name: postgresql
-            source: RELATED_IMAGE_POSTGRESQL  # placeholder in CSV
-            target: registry.redhat.io/rhel9/postgresql-15:latest  # image to resolve
+            search: RELATED_IMAGE_POSTGRESQL  # placeholder in CSV
+            replace: registry.redhat.io/rhel9/postgresql-15:latest  # image to resolve
🤖 Prompt for AI Agents
In `@doozer/doozerlib/backend/rebaser.py` around lines 2439 - 2451, The code is
using ext_img.get('search') and ext_img.get('replace') which mismatches the
docstring; update the loop in rebaser.py to read ext_img.get('source') into the
variable source and ext_img.get('target') into target (keep the existing
validation, logging via self._logger, and subsequent call to
util.oc_image_info_for_arch_async__caching(target) unchanged) so field names
match the documented API.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 10, 2026

@Ximinhan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security 3b9d7b8 link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@fbladilo
Copy link
Contributor

fbladilo commented Feb 12, 2026

@Ximinhan I would fix the Docstring and examples which use incorrect field names source: and target: , should really be search: and replace: to avoid confusion between implementation and docstring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants