Skip to content

Add caching for oc image info#2472

Open
jupierce wants to merge 1 commit intoopenshift-eng:mainfrom
jupierce:oc_image_info_redis
Open

Add caching for oc image info#2472
jupierce wants to merge 1 commit intoopenshift-eng:mainfrom
jupierce:oc_image_info_redis

Conversation

@jupierce
Copy link
Contributor

Speed up jobs that repeatedly need image info by caching sha256 pullspec results in redis, if available.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sosiouxme for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new Redis-backed oc image info caching module (sync and async) and replaces direct oc image info subprocess calls across multiple services with cache-backed helpers; updates error handling to exception-based flows, introduces retries, in-process LRU caching, and test adjustments for the new APIs.

Changes

Cohort / File(s) Summary
New OC Image Info Caching Module
artcommon/artcommonlib/oc_image_info.py
Adds oc_image_info__cached (sync) and oc_image_info__cached_async (async), plus LRU variants. Enforces no -o/--output or -a/--registry-config in options, always appends -o json, detects sha256-pinned pullspecs, builds deterministic Redis keys with versioning, uses 60-day TTL, 3-attempt retries (10s), reads-before-executes, writes-on-miss, swallows/logs Redis errors, and propagates CLI failures.
ArtCommonLib integrations
artcommon/artcommonlib/rhcos.py, artcommon/artcommonlib/...
Replaces exectools/CLI calls with oc_image_info__cached; preserves JSON parsing for labels/digests and removes direct exectools dependency.
Doozer — core, CLI, OLM, utils
doozer/doozerlib/backend/konflux_image_builder.py, doozer/doozerlib/cli/get_nightlies.py, doozer/doozerlib/cli/scan_sources_konflux.py, doozer/doozerlib/olm/bundle.py, doozer/doozerlib/release_inspector.py, doozer/doozerlib/util.py
Switches to oc_image_info__cached/_async calls; adapts to exception-based error handling (ChildProcessError → mapped IOError/ValueError/None where applicable); maintains fallback behaviors and label/NVR extraction logic; updates option ordering where applicable.
Elliott & PyArtCD
elliott/elliottlib/util.py, pyartcd/pyartcd/pipelines/promote.py
Converts per-tag subprocess invocations to centralized async cached lookups (oc_image_info__cached_async), adds helper wrappers mapping ChildProcessError to runtime errors, preserves downstream parsing and error semantics with exception-based handling for "manifest unknown"/expired messages.
Tests Updated
doozer/tests/test_rhcos.py, doozer/tests/cli/test_scan_sources_konflux.py, doozer/tests/test_util.py
Mocks and patches updated to target oc_image_info__cached/_async; test stubs now return/expect JSON strings; call-argument ordering adjusted (options before pullspec) and assertions updated to check new call signatures and exception semantics.
Other callers updated
doozer/... (other small diffs), pyartcd/...
Multiple modules updated to import and use the new cached APIs, replacing direct exectools/command invocations and adjusting error handling accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Calling Code
    participant CacheFunc as oc_image_info__cached(_async)
    participant Redis
    participant OC as oc CLI

    Caller->>CacheFunc: oc_image_info__cached(_async)(pullspec, options)

    rect rgba(100, 200, 255, 0.5)
    Note over CacheFunc: Validate options (forbid -o/--output, -a/--registry-config)\nCheck sha256-pinned + Redis credentials
    end

    alt Cacheable
        CacheFunc->>Redis: GET cache_key
        Redis-->>CacheFunc: cached_json or miss
        alt Hit
            Note over CacheFunc: Return cached JSON
        else Miss
            rect rgba(150, 255, 150, 0.5)
            Note over CacheFunc: Execute oc with retries (3 attempts, 10s)
            end

            CacheFunc->>OC: oc image info -o json [options] pullspec
            OC-->>CacheFunc: stdout (image_info_json) or error

            alt Success
                CacheFunc->>Redis: SET cache_key image_info_json (TTL 60d)
                Redis-->>CacheFunc: OK or error (logged)
                CacheFunc-->>Caller: image_info_json
            else Failure
                CacheFunc-->>Caller: raises ChildProcessError
            end
        end
    else Not cacheable
        rect rgba(150, 255, 150, 0.5)
        Note over CacheFunc: Execute oc with retries (3 attempts, 10s)
        end
        CacheFunc->>OC: oc image info -o json [options] pullspec
        OC-->>CacheFunc: stdout (image_info_json) or error
        alt Success
            CacheFunc-->>Caller: image_info_json
        else Failure
            CacheFunc-->>Caller: raises ChildProcessError
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
doozer/doozerlib/util.py (1)

582-608: ⚠️ Potential issue | 🟠 Major

Consolidate retry logic to avoid cascading retry attempts.

Both oc_image_info/oc_image_info_async wrappers (lines 582, 693) and their underlying cache helpers (oc_image_info__redis_cache/oc_image_info__redis_cache_async in artcommonlib) have identical @retry(reraise=True, stop=stop_after_attempt(3), wait=wait_fixed(10)) decorators. This results in up to 9 cumulative retry attempts (3×3) with stacked delays on failures, causing unnecessary delays on transient registry errors.

Remove the @retry decorators from the wrapper functions and rely solely on the cache helper retry logic, or consolidate to a single retry layer at the appropriate boundary.

Example: remove outer retries
-@retry(reraise=True, stop=stop_after_attempt(3), wait=wait_fixed(10))
 def oc_image_info(
     pullspec: str,
     *options,
     registry_config: Optional[str] = None,
     strict: bool = True,
 ) -> dict | list[dict] | None:

-@retry(reraise=True, stop=stop_after_attempt(3), wait=wait_fixed(10))
 async def oc_image_info_async(
     pullspec: str,
     *options,
     registry_config: Optional[str] = None,
 ) -> Union[Dict, List]:

Also applies to: 693-714

🤖 Fix all issues with AI agents
In `@elliott/elliottlib/util.py`:
- Around line 689-699: The loop that pairs tags with image_info_results should
use strict zipping to fail fast if lengths diverge; update the zip call in the
block that follows the asyncio.gather (references: _get_image_info,
oc_image_info__redis_cache_async, image_info_results, tags) to use zip(tags,
image_info_results, strict=True) so any mismatch raises immediately instead of
silently truncating.
🧹 Nitpick comments (1)
artcommon/artcommonlib/oc_image_info.py (1)

91-98: Validate forbidden options even when cache is disabled.

_make_cache_key enforces the no--o/--registry-config rule, but it only runs when Redis is available and the pullspec is sha256-pinned. When cache is disabled, those forbidden options can slip through and potentially alter oc image info output unexpectedly. Consider validating options unconditionally.

♻️ Suggested approach
+def _validate_options(options: Tuple[str, ...]) -> None:
+    for opt in options:
+        for prefix in _FORBIDDEN_OPTION_PREFIXES:
+            if opt == prefix or opt.startswith(f"{prefix}=") or opt.startswith(f"{prefix} "):
+                raise ValueError(
+                    f"Option {opt!r} must not be passed in *options*. "
+                    f"'-o json' is added automatically; use the registry_config parameter for auth."
+                )
+
 def _cache_context(pullspec: str, options: Tuple[str, ...]):
     """Return ``(use_cache, cache_key)`` for the given request.
@@
-    use_cache = _is_cacheable(pullspec) and _redis_available()
+    _validate_options(options)
+    use_cache = _is_cacheable(pullspec) and _redis_available()
     cache_key = _make_cache_key(pullspec, options) if use_cache else ""
     return use_cache, cache_key

Comment on lines 689 to 699
async def _get_image_info(tag):
pullspec = tag['from']['name']
try:
return await oc_image_info__redis_cache_async(pullspec)
except ChildProcessError as e:
raise RuntimeError(f"Unable to run oc image info for {pullspec}: {e}") from e

image_info_results = await asyncio.gather(*[_get_image_info(tag) for tag in tags])

for image, stdout in zip(tags, image_info_results):
image_name = image['name']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Python version requirements in various config files
rg -n "python_requires|requires-python|python_version" -g 'pyproject.toml' -g 'setup.cfg' -g 'setup.py' -g 'tox.ini'

Repository: openshift-eng/art-tools

Length of output: 110


Use zip(..., strict=True) to prevent silent truncation if tags and results ever diverge.

While asyncio.gather should keep lengths aligned, adding strict=True ensures the loop fails fast if they ever mismatch due to future refactors. The project requires Python 3.11+, which fully supports this parameter.

✅ Recommended change
-    for image, stdout in zip(tags, image_info_results):
+    for image, stdout in zip(tags, image_info_results, strict=True):
🧰 Tools
🪛 Ruff (0.15.0)

[warning] 694-694: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 698-698: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
In `@elliott/elliottlib/util.py` around lines 689 - 699, The loop that pairs tags
with image_info_results should use strict zipping to fail fast if lengths
diverge; update the zip call in the block that follows the asyncio.gather
(references: _get_image_info, oc_image_info__redis_cache_async,
image_info_results, tags) to use zip(tags, image_info_results, strict=True) so
any mismatch raises immediately instead of silently truncating.

@jupierce jupierce force-pushed the oc_image_info_redis branch 2 times, most recently from 068bf5c to bf0f95c Compare February 13, 2026 16:47
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pyartcd/pyartcd/pipelines/promote.py`:
- Around line 1854-1865: In get_image_info, when catching ChildProcessError (and
the similar handler around lines 1893-1905), preserve exception context by
re-raising the new IOError with the original exception as its __cause__ (use
"raise IOError(f'Image {pullspec} is not found.') from e") instead of raising
without chaining; update both occurrences (the one in get_image_info and the
other handler) so the original ChildProcessError is attached to the IOError for
full traceback information.
🧹 Nitpick comments (3)
artcommon/artcommonlib/oc_image_info.py (1)

130-167: Validate forbidden options even when caching is disabled.
The forbidden-option check only runs when the pullspec is cacheable because _make_cache_key is skipped otherwise. A tag-based pullspec could pass -o / --registry-config and bypass the contract.

♻️ Suggested refactor
+def _validate_options(options: tuple[str, ...]) -> None:
+    for opt in options:
+        for prefix in _FORBIDDEN_OPTION_PREFIXES:
+            if opt == prefix or opt.startswith(f"{prefix}=") or opt.startswith(f"{prefix} "):
+                raise ValueError(
+                    f"Option {opt!r} must not be passed in *options*. "
+                    f"'-o json' is added automatically; use the registry_config parameter for auth."
+                )
+
 def _make_cache_key(pullspec: str, options: tuple[str, ...]) -> str:
-    for opt in options:
-        for prefix in _FORBIDDEN_OPTION_PREFIXES:
-            if opt == prefix or opt.startswith(f"{prefix}=") or opt.startswith(f"{prefix} "):
-                raise ValueError(
-                    f"Option {opt!r} must not be passed in *options*. "
-                    f"'-o json' is added automatically; use the registry_config parameter for auth."
-                )
+    _validate_options(options)
     key_input = f"{pullspec}\0{chr(0).join(sorted(options))}"
     digest = hashlib.sha256(key_input.encode()).hexdigest()
     return f"oc_image_info:{_CACHE_KEY_VERSION}:{digest}"
@@
 def oc_image_info__redis_cache(
@@
 ) -> str:
+    _validate_options(options)
     use_cache, cache_key = _cache_context(pullspec, options)
@@
 async def oc_image_info__redis_cache_async(
@@
 ) -> str:
+    _validate_options(options)
     use_cache, cache_key = _cache_context(pullspec, options)
doozer/doozerlib/util.py (1)

582-714: Avoid stacking retries in the oc_image_info helpers.
The cache helpers already retry; keeping the outer decorators can multiply attempts and delays on failure.

♻️ Possible simplification
-@retry(reraise=True, stop=stop_after_attempt(3), wait=wait_fixed(10))
 def oc_image_info(
@@
-@retry(reraise=True, stop=stop_after_attempt(3), wait=wait_fixed(10))
 async def oc_image_info_async(
doozer/doozerlib/release_inspector.py (1)

76-82: Consider normalizing --filter-by-os to os/arch format for consistency across the codebase.

While oc image info --filter-by-os accepts regex patterns (so bare s390x works and matches any platform containing s390x), the codebase is inconsistent. Some calls use the full linux/amd64 format while others use bare architectures like amd64. Normalizing to linux/{arch} when missing would align with established patterns elsewhere in the codebase and make the intent clearer.

Suggested improvement
-    if arch:
-        options.append(f"--filter-by-os={arch}")
+    if arch:
+        filter_value = arch if "/" in arch else f"linux/{arch}"
+        options.append(f"--filter-by-os={filter_value}")

Comment on lines 1854 to 1865
async def get_image_info(pullspec: str, raise_if_not_found: bool = False):
# Get image manifest/manifest-list.
cmd = f'oc image info --show-multiarch -o json {pullspec}'
env = os.environ.copy()
rc, stdout, stderr = await exectools.cmd_gather_async(cmd, check=False, env=env)
if rc != 0:
if "not found: manifest unknown" in stderr or "was deleted or has expired" in stderr:
try:
stdout = await oc_image_info__redis_cache_async(pullspec, '--show-multiarch')
except ChildProcessError as e:
err_msg = str(e)
if "not found: manifest unknown" in err_msg or "was deleted or has expired" in err_msg:
# image doesn't exist
if raise_if_not_found:
raise IOError(f"Image {pullspec} is not found.")
return None
raise ChildProcessError(f"Error running {cmd}: exit_code={rc}, stdout={stdout}, stderr={stderr}")
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve exception context when mapping to IOError.
Chaining keeps the original failure details in tracebacks.

🛠️ Suggested fix
-                    raise IOError(f"Image {pullspec} is not found.")
+                    raise IOError(f"Image {pullspec} is not found.") from e
@@
-                    raise IOError(f"Image {pullspec} is not found.")
+                    raise IOError(f"Image {pullspec} is not found.") from e

Also applies to: 1893-1905

🧰 Tools
🪛 Ruff (0.15.0)

[warning] 1863-1863: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


[warning] 1863-1863: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@pyartcd/pyartcd/pipelines/promote.py` around lines 1854 - 1865, In
get_image_info, when catching ChildProcessError (and the similar handler around
lines 1893-1905), preserve exception context by re-raising the new IOError with
the original exception as its __cause__ (use "raise IOError(f'Image {pullspec}
is not found.') from e") instead of raising without chaining; update both
occurrences (the one in get_image_info and the other handler) so the original
ChildProcessError is attached to the IOError for full traceback information.

@jupierce jupierce force-pushed the oc_image_info_redis branch from bf0f95c to 7f29df9 Compare February 13, 2026 16:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
doozer/doozerlib/cli/get_nightlies.py (1)

1-12: ⚠️ Potential issue | 🟡 Minor

Run ruff format for this file to fix CI.
The pipeline reports a ruff format failure here.

elliott/elliottlib/util.py (1)

1-5: ⚠️ Potential issue | 🟡 Minor

Run ruff format for this file (CI is failing).

The pipeline reports a ruff format failure for this file.

🤖 Fix all issues with AI agents
In `@artcommon/artcommonlib/oc_image_info.py`:
- Around line 1-34: Run the ruff auto-formatter on artcommonlib/oc_image_info.py
to fix the CI formatting error: execute the ruff format/--fix command (or run
your project's pre-commit hooks) and commit the resulting changes; ensure the
module docstring, imports (hashlib, logging, os, typing...), and top-level
constants (_CACHE_EXPIRY_SECONDS, _CACHE_KEY_VERSION) are formatted per ruff
rules and that the file ends with a single trailing newline.

In `@artcommon/artcommonlib/rhcos.py`:
- Around line 1-7: This file fails ruff formatting; run the auto-formatter
(e.g., ruff format or ruff --fix) on artcommon/artcommonlib/rhcos.py to correct
import ordering/whitespace and any other style issues reported by ruff, then
stage the updated file; focus on the top-level imports shown (json, logutil,
go_arch_for_brew_arch, ListModel/Model, oc_image_info__redis_cache,
GroupRuntime) to ensure they are properly ordered and spaced per ruff/PEP8 rules
so CI passes.

@jupierce jupierce force-pushed the oc_image_info_redis branch from 7f29df9 to 0e37491 Compare February 13, 2026 18:17
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2026

@jupierce: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security 0e37491 link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant