Skip to content

Introduce cache provider#704

Open
mithunbharadwaj wants to merge 33 commits intomasterfrom
feature/Cache-provider
Open

Introduce cache provider#704
mithunbharadwaj wants to merge 33 commits intomasterfrom
feature/Cache-provider

Conversation

@mithunbharadwaj
Copy link
Copy Markdown
Collaborator

@mithunbharadwaj mithunbharadwaj commented Jul 31, 2025

Summary by CodeRabbit

  • New Features

    • Read-only, cache-backed library layer that serves content from a local on-disk snapshot.
    • Cache-first behavior for registry-style URIs so the local snapshot is consulted before live providers.
  • Improvements

    • Consistent provider registration across URI schemes and merged, layered listing that preserves per-item layer info.
    • Blank/comment entries are ignored and unknown-provider errors are clearer.

@mithunbharadwaj mithunbharadwaj self-assigned this Jul 31, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jul 31, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a read-only CacheLibraryProvider that serves from a configured on-disk cache snapshot (layered by sha256(uri)) and integrates it as a cache-first wrapper for libsreg+ URIs. LibraryService also normalizes provider construction, ignores blank/comment entries, and refines internal listing and error handling.

Changes

Cohort / File(s) Summary
Cache Library Provider
asab/library/providers/cache.py
New CacheLibraryProvider (reads from filesystem only). Computes SHA256(uri) to locate snapshot under [library:cache].dir/@global/<hash>, validates cache root (critical + exit) and snapshot (warning), initializes with file:// cache path, subscribes to library.cache.ready / library.cache.not_ready, and overrides read/list/find/subscribe to raise KeyError when cache not live.
Library service provider integration
asab/library/service.py
For libsreg+ paths, registers a CacheLibraryProvider wrapper then the live LibsRegLibraryProvider (cache wrapper always appended). Normalizes provider construction for zk:///zookeeper://, file:///.///, azure+https://, git+ by appending concrete providers to self.Libraries. Ignores blank/#/; entries. Unknown-provider now logs “Incorrect provider for ''” and exits SystemExit(1). _list doc and behavior clarified to merge items across providers and preserve per-item layer info; added KeyError and broad-exception handling. Minor doc/format tweaks.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant LibraryService
    participant CacheProvider
    participant LiveProvider

    Client->>LibraryService: request (read/list/find/subscribe)
    LibraryService->>CacheProvider: forward request
    alt Cache live (snapshot present)
        CacheProvider-->>LibraryService: return cached result
        LibraryService-->>Client: return result
    else Cache not live (raises KeyError)
        LibraryService->>LiveProvider: forward request to live provider
        LiveProvider-->>LibraryService: return live result
        LibraryService-->>Client: return result
    end
    Note over CacheProvider: subscribes to `library.cache.ready` / `library.cache.not_ready`\nand toggles internal readiness state
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I nibble on hashes beneath a mossy tree,
Snapshots stacked where carrots used to be,
Read-only hops keep the cache in sight,
I wait for "cache ready" to spring into flight,
— a rabbit in the repo 🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Introduce cache provider' accurately reflects the main change: adding a new CacheLibraryProvider class and integrating it into the library service.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/Cache-provider

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
asab/library/providers/cacahe.py (1)

1-100: Critical: Fix filename typo.

The filename contains a typo: cacahe.py should be cache.py. This will cause import errors and module discovery issues.

Rename the file from asab/library/providers/cacahe.py to asab/library/providers/cache.py.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f63d0ce and 6578a93.

📒 Files selected for processing (1)
  • asab/library/providers/cacahe.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: CI
asab/library/providers/cacahe.py

[error] 11-11: flake8: E302 expected 2 blank lines, found 1

🔇 Additional comments (4)
asab/library/providers/cacahe.py (4)

1-10: LGTM! Import statements are well-organized and necessary.

All imports are appropriately used throughout the implementation and follow standard conventions.


11-56: Excellent documentation and class design!

The comprehensive docstring clearly explains the purpose, usage patterns, on-disk structure, and integration considerations. The inheritance from FileSystemLibraryProvider is well-justified for leveraging existing filesystem operations while providing cache-specific behavior.


57-76: Well-implemented initialization with robust error handling.

The SHA256 hashing provides stable layer identification, configuration handling is appropriate, and the graceful handling of missing cache directories with warnings is good. The set_ready=False parameter correctly indicates this provider relies on pre-populated cache content.


78-86: LGTM! Clean delegation to parent with clear documentation.

The method correctly delegates to the parent FileSystemLibraryProvider.read() which will return None for missing files, aligning perfectly with the documented cache-only behavior.

Comment on lines +88 to +99
async def list(self, path):
"""
Attempt to list directory from cache; return empty list on cache miss.

:param path: Absolute directory path (e.g. '/Templates/')
:returns: List of LibraryItem objects
"""
try:
return await super().list(path)
except Exception:
L.debug("Cache list miss or error: %s", path)
return []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix formatting issue to resolve pipeline failure.

The logic is sound - gracefully handling cache misses by returning an empty list and using appropriate debug logging. However, there's a formatting issue causing the pipeline failure.

Apply this diff to fix the formatting issue:

         except Exception:
             L.debug("Cache list miss or error: %s", path)
             return []
+

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In asab/library/providers/cacahe.py around lines 88 to 99, fix the formatting
issue by ensuring proper indentation and spacing in the async list method. Make
sure the docstring is correctly indented, the try-except block is properly
aligned, and there are no missing or extra blank lines causing syntax or style
errors that lead to pipeline failure.

try:
return await super().list(path)
except Exception:
L.warning("Cache list miss or error: %s", path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use .format(...)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@ateska
Copy link
Copy Markdown
Contributor

ateska commented Jul 31, 2025

In general, I don't see a way how this is integrated with existing provider (wrapper?)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
asab/library/providers/cache.py (1)

11-100: Integrate CacheLibraryProvider in LibraryService._create_library

The CacheLibraryProvider class is ready but never wired into _create_library, so libsreg+ URIs still use LibsRegLibraryProvider. Please add the cache provider as the first handler for libsreg+ paths:

• In asab/library/service.py inside _create_library, replace or precede the existing libsreg+ branch with CacheLibraryProvider.
• Ensure any cache miss behavior is clear (no fallback vs. chained underlying fetch).
• Update documentation (docs/reference/services/library.md) to mention the new cache‐backed provider.
• Add an integration test that verifies _create_library picks CacheLibraryProvider for libsreg+ URIs.

Suggested diff snippet:

--- a/asab/library/service.py
+++ b/asab/library/service.py
@@ def _create_library(self, path, layer):
-        elif path.startswith('libsreg+'):
-            from .providers.libsreg import LibsRegLibraryProvider
-            library_provider = LibsRegLibraryProvider(self, path, layer)
+        elif path.startswith('libsreg+'):
+            from .providers.cache import CacheLibraryProvider
+            # Serve exclusively from on‐disk cache for libsreg+ URIs
+            library_provider = CacheLibraryProvider(self, path, layer)
🧹 Nitpick comments (3)
asab/library/providers/cache.py (3)

11-56: Improve docstring accuracy and organization.

The docstring is comprehensive but has some issues:

  1. Inconsistent platform reference: Line 49 mentions "GitLab Merge Request" but this appears to be a GitHub repository.
  2. Potentially inaccurate usage example: The async context manager usage in lines 28-33 may not match the actual API.
  3. Mixed concerns: The docstring combines class documentation, usage examples, and implementation notes.

Consider restructuring the docstring:

 class CacheLibraryProvider(FileSystemLibraryProvider):
     """
     A read-only cache-backed wrapper that serves exclusively from the on-disk cache.
     No fallback to upstream providers; cache misses return None or empty list.
 
+    Cache Structure:
     Under [library:cache].dir, each layer is exposed via:
       @global/<layer_hash>/...
     where layer_hash = sha256(uri).
-
-    Usage (registering provider in LibraryService._create_library):
-        provider = CacheLibraryProvider(library, uri, layer)
-
-    Consuming cached layers:
-    1. Ensure LibraryCacheService has populated the cache and published 'library.cache.ready'.
-    2. LibraryService will wrap each 'libsreg+' URI with this CacheLibraryProvider.
-    3. To read an item:
-       ```python
-       async with library.open('/path/to/item') as stream:
-           if stream is None:
-               # cache miss or not found
-               return None
-           data = stream.read()
-       ```
-    4. To list contents of a directory:
-       ```python
-       items = await library.list('/path/to/directory/')
-       for item in items:
-           print(item.name)
-       ```
-
-    Layer directories on disk:
-      [cache_root]/@cache/<unique_id>/            # immutable snapshots
-      [cache_root]/@cache/<unique_id>/.uri         # original URI
-      [cache_root]/@cache/<unique_id>/.unique_id   # snapshot hash
-      [cache_root]/@global/<layer_hash>/           # symlink to active snapshot
-      [cache_root]/@global/<layer_hash>/.uri
-      [cache_root]/@global/<layer_hash>/.unique_id
-
-    Notes for GitLab Merge Request:
-    - This change introduces a CacheLibraryProvider to serve from disk-only cache.
-    - Ensure `[library:cache].dir` is documented in configuration reference.
-    - Update LibraryService to wrap `libsreg+` URIs with this provider instead of direct network fetch.
-    - Add integration test to verify cache lookup and symlink resolution.
-    - Confirm backward compatibility: providers other than `libsreg+` remain unaffected.
-    - Mention atomic symlink swap behavior in release notes.
     """

Move usage examples and implementation notes to separate documentation or comments.


79-86: Consider if method override is necessary.

This method simply calls the parent implementation without any additional logic. If the parent FileSystemLibraryProvider.read() already returns None for missing files, this override may be unnecessary.

Consider either:

  1. Remove the override if no cache-specific logic is needed
  2. Add cache-specific logging to distinguish cache misses from other errors
 async def read(self, path):
     """
     Attempt to read from cache; return None on cache miss.

     :param path: Absolute path within the library (e.g. '/Templates/config.json')
     :returns: File-like stream or None
     """
-    return await super().read(path)
+    result = await super().read(path)
+    if result is None:
+        L.debug("Cache miss for read: %s", path)
+    return result

88-99: Improve exception handling specificity.

The method correctly handles cache misses by returning an empty list, but the broad Exception catch could mask programming errors.

Consider catching more specific exceptions:

 async def list(self, path):
     """
     Attempt to list directory from cache; return empty list on cache miss.

     :param path: Absolute directory path (e.g. '/Templates/')
     :returns: List of LibraryItem objects
     """
     try:
         return await super().list(path)
-    except Exception:
+    except (FileNotFoundError, NotADirectoryError, PermissionError, OSError):
         L.warning("Cache list miss or error: %s", path)
         return []

This approach catches filesystem-related errors while allowing programming errors to propagate for debugging.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11ccb21 and 2cf5951.

📒 Files selected for processing (1)
  • asab/library/providers/cache.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: CI
asab/library/providers/cache.py

[error] 11-11: flake8: E302 expected 2 blank lines, found 1

🔇 Additional comments (1)
asab/library/providers/cache.py (1)

1-10: LGTM!

The imports are well-organized and all appear necessary for the implementation. The logger setup follows standard conventions.

Comment on lines +57 to +76
def __init__(self, library, uri, layer):
# Compute stable layer key
self.uri = uri
self.layer_hash = hashlib.sha256(uri.encode('utf-8')).hexdigest()

# Locate configured cache root
cache_root = Config.get('library:cache', 'dir', fallback=None)
if not cache_root:
raise exceptions.LibraryConfigurationError(
"Missing [library:cache].dir configuration"
)

# Build path to on-disk cache for this layer
self.cache_dir = os.path.join(cache_root, '@global', self.layer_hash)
if not os.path.isdir(self.cache_dir):
L.warning("Cache directory not found for %s: %s", uri, self.cache_dir)

# Initialize filesystem provider against cache path, skip ready event
cache_uri = 'file://' + self.cache_dir.rstrip('/')
super().__init__(library, cache_uri, layer, set_ready=False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Improve error handling and validate parent constructor usage.

The constructor logic is sound, but consider these improvements:

  1. Potential parent constructor failure: If the cache directory doesn't exist, the parent FileSystemLibraryProvider might fail during initialization.

  2. Verify set_ready parameter: The set_ready=False parameter appears to be a custom addition - ensure this is supported by the parent class.

Consider more robust error handling:

     def __init__(self, library, uri, layer):
         # Compute stable layer key
         self.uri = uri
         self.layer_hash = hashlib.sha256(uri.encode('utf-8')).hexdigest()
 
         # Locate configured cache root
         cache_root = Config.get('library:cache', 'dir', fallback=None)
         if not cache_root:
             raise exceptions.LibraryConfigurationError(
                 "Missing [library:cache].dir configuration"
             )
 
         # Build path to on-disk cache for this layer
         self.cache_dir = os.path.join(cache_root, '@global', self.layer_hash)
         if not os.path.isdir(self.cache_dir):
             L.warning("Cache directory not found for %s: %s", uri, self.cache_dir)
-
+            # Consider creating empty directory or raising exception based on requirements
+
         # Initialize filesystem provider against cache path, skip ready event
         cache_uri = 'file://' + self.cache_dir.rstrip('/')
         super().__init__(library, cache_uri, layer, set_ready=False)

Verify that the parent class supports the set_ready parameter:


🏁 Script executed:

#!/bin/bash
# Search for FileSystemLibraryProvider constructor to verify set_ready parameter
ast-grep --pattern 'class FileSystemLibraryProvider {
  $$$
  def __init__($_, $$$) {
    $$$
  }
  $$$
}'

Length of output: 108


🏁 Script executed:

#!/bin/bash
# Locate the definition of FileSystemLibraryProvider
rg -n "class FileSystemLibraryProvider" .

Length of output: 138


🏁 Script executed:

#!/bin/bash
# Display FileSystemLibraryProvider constructor signature and context
rg -n "class FileSystemLibraryProvider" -A20 -B2 asab/library/providers/filesystem.py

Length of output: 797


Ensure the cache directory is created to avoid downstream failures

The set_ready=False parameter is supported by FileSystemLibraryProvider.__init__; no change needed there. However, if the cache directory doesn’t exist, initializing the provider will succeed but any file operations will error at runtime. To harden this:

• File: asab/library/providers/cache.py
• Location: inside __init__, right after the L.warning(...) call

Suggested diff:

     if not os.path.isdir(self.cache_dir):
         L.warning("Cache directory not found for %s: %s", uri, self.cache_dir)
+        try:
+            os.makedirs(self.cache_dir, exist_ok=True)
+            L.info("Created cache directory: %s", self.cache_dir)
+        except OSError as e:
+            raise exceptions.LibraryConfigurationError(
+                f"Unable to create cache directory {self.cache_dir}: {e}"
+            )

This guarantees the directory exists before passing its URI to the parent provider and surfaces any creation errors immediately.

🤖 Prompt for AI Agents
In asab/library/providers/cache.py within the __init__ method around lines 57 to
76, the cache directory may not exist, causing runtime errors during file
operations. After the L.warning call that logs the missing directory, add code
to create the cache directory using os.makedirs with exist_ok=True to ensure it
exists before initializing the parent provider. This will prevent downstream
failures by guaranteeing the directory is present or raising an error
immediately if creation fails.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
asab/library/service.py (1)

134-144: Consider documenting the cache-fallback behavior.

The dual-provider registration creates a cache-first, fallback-to-network behavior that should be documented for users configuring libsreg+ URIs.

Consider adding inline documentation to clarify the provider ordering:

 		elif path.startswith("libsreg+"):
-			# 1) register the on-disk cache provider first
+			# Register cache-first architecture for libsreg+ URIs:
+			# 1) Cache provider serves from local disk (fast, cache hits)
 			from .providers.cache import CacheLibraryProvider
 			cache = CacheLibraryProvider(self, path, layer)
 			self.Libraries.append(cache)
 
-			# 2) then register the real registry provider as a fallback
+			# 2) Registry provider handles cache misses (network fallback)
 			from .providers.libsreg import LibsRegLibraryProvider
 			real = LibsRegLibraryProvider(self, path, layer)
 			self.Libraries.append(real)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cf5951 and ea17922.

📒 Files selected for processing (1)
  • asab/library/service.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
asab/library/service.py (2)
asab/library/providers/cache.py (1)
  • CacheLibraryProvider (11-99)
asab/library/providers/libsreg.py (1)
  • LibsRegLibraryProvider (26-274)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
asab/library/providers/cache.py (2)

9-11: Fix formatting issue: Add missing blank line.

The flake8 E302 error persists - there should be 2 blank lines before the class definition.

 L = logging.getLogger(__name__)
+

 class CacheLibraryProvider(FileSystemLibraryProvider):

72-74: Create cache directory to prevent runtime failures.

The cache directory should be created if it doesn't exist to avoid downstream failures during file operations.

     if not os.path.isdir(self.cache_dir):
         L.warning("Cache directory not found for %s: %s", uri, self.cache_dir)
+        try:
+            os.makedirs(self.cache_dir, exist_ok=True)
+            L.info("Created cache directory: %s", self.cache_dir)
+        except OSError as e:
+            raise exceptions.LibraryConfigurationError(
+                f"Unable to create cache directory {self.cache_dir}: {e}"
+            )
🧹 Nitpick comments (1)
asab/library/providers/cache.py (1)

80-87: Consider removing unnecessary method override.

This method simply delegates to the parent implementation without adding any cache-specific logic. Since FileSystemLibraryProvider.read() likely already returns None for missing files, this override may be redundant.

However, keeping it for explicit documentation of cache behavior is acceptable if desired.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea17922 and a90c667.

📒 Files selected for processing (1)
  • asab/library/providers/cache.py (1 hunks)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
asab/library/providers/cache.py (1)

74-77: Add exception handling consistent with read method.

The list method follows the same pattern as read but also needs exception handling for cache operation failures.

Apply this diff to add exception handling:

 async def list(self, path):
     if not os.path.isdir(self.cache_dir):
         return await self._real.list(path)
-    return await super().list(path)
+    try:
+        return await super().list(path)
+    except (FileNotFoundError, PermissionError, OSError) as e:
+        L.warning("Cache list failed for %s: %s, falling back to real provider", path, e)
+        return await self._real.list(path)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60635d5 and 5fafca1.

📒 Files selected for processing (2)
  • asab/library/providers/cache.py (1 hunks)
  • asab/library/service.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • asab/library/service.py
🔇 Additional comments (3)
asab/library/providers/cache.py (3)

1-13: LGTM! Clean import structure and proper module organization.

The imports are well-organized and all necessary for the cache provider functionality. The module structure follows good Python conventions.


65-66: LGTM! Proper async event handling.

The cache ready event handler correctly awaits the _set_ready() call to update provider readiness state.


15-32: Integration Verified: CacheLibraryProvider correctly wraps LibsRegLibraryProvider

The LibraryService implementation for libsreg+ URIs confirms that:

  • A LibsRegLibraryProvider instance is created as real.
  • A CacheLibraryProvider is instantiated with real_provider=real.
  • Only the cache wrapper is registered in self.Libraries, preserving fallback behavior.

No further action required.

Comment on lines +34 to +63
def __init__(self, library, uri, layer, real_provider):
# Keep for fallback
self._real = real_provider
self.uri = uri
self.layer_hash = hashlib.sha256(uri.encode("utf-8")).hexdigest()

# 1) Validate cache root
cache_root = Config.get("library:cache", "dir", fallback=None)
if not cache_root:
raise Exception(
"Missing [library:cache].dir configuration"
)
if not os.path.isdir(cache_root):
L.critical("Cache root %s not found, exiting.", cache_root)
raise SystemExit("Missing cache root")

# 2) Build the cache URI
self.cache_dir = os.path.join(cache_root, "@global", self.layer_hash)
if not os.path.isdir(self.cache_dir):
L.warning("Cache directory not found for %s: %s", uri, self.cache_dir)

cache_uri = "file://" + self.cache_dir.rstrip("/")
# Initialize the filesystem provider *only* against the cache
super().__init__(library, cache_uri, layer, set_ready=False)

# 3) Re-publish readiness when new snapshots arrive
library.App.PubSub.subscribe(
"library.cache.ready",
self._on_cache_ready
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve exception handling in constructor.

The constructor has good structure but uses overly broad exception handling:

  1. Generic Exception usage (lines 43-45): Use a specific library exception instead of generic Exception.
  2. SystemExit too aggressive (line 48): Using SystemExit in a library component can terminate the entire application. Consider throwing a library-specific exception instead.

Apply this diff to improve exception handling:

+from asab.library import exceptions
+
 def __init__(self, library, uri, layer, real_provider):
     # Keep for fallback
     self._real = real_provider
     self.uri = uri
     self.layer_hash = hashlib.sha256(uri.encode("utf-8")).hexdigest()

     # 1) Validate cache root
     cache_root = Config.get("library:cache", "dir", fallback=None)
     if not cache_root:
-        raise Exception(
+        raise exceptions.LibraryConfigurationError(
             "Missing [library:cache].dir configuration"
         )
     if not os.path.isdir(cache_root):
         L.critical("Cache root %s not found, exiting.", cache_root)
-        raise SystemExit("Missing cache root")
+        raise exceptions.LibraryConfigurationError(f"Cache root directory not found: {cache_root}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, library, uri, layer, real_provider):
# Keep for fallback
self._real = real_provider
self.uri = uri
self.layer_hash = hashlib.sha256(uri.encode("utf-8")).hexdigest()
# 1) Validate cache root
cache_root = Config.get("library:cache", "dir", fallback=None)
if not cache_root:
raise Exception(
"Missing [library:cache].dir configuration"
)
if not os.path.isdir(cache_root):
L.critical("Cache root %s not found, exiting.", cache_root)
raise SystemExit("Missing cache root")
# 2) Build the cache URI
self.cache_dir = os.path.join(cache_root, "@global", self.layer_hash)
if not os.path.isdir(self.cache_dir):
L.warning("Cache directory not found for %s: %s", uri, self.cache_dir)
cache_uri = "file://" + self.cache_dir.rstrip("/")
# Initialize the filesystem provider *only* against the cache
super().__init__(library, cache_uri, layer, set_ready=False)
# 3) Re-publish readiness when new snapshots arrive
library.App.PubSub.subscribe(
"library.cache.ready",
self._on_cache_ready
)
from asab.library import exceptions
def __init__(self, library, uri, layer, real_provider):
# Keep for fallback
self._real = real_provider
self.uri = uri
self.layer_hash = hashlib.sha256(uri.encode("utf-8")).hexdigest()
# 1) Validate cache root
cache_root = Config.get("library:cache", "dir", fallback=None)
if not cache_root:
- raise Exception(
- "Missing [library:cache].dir configuration"
- )
+ raise exceptions.LibraryConfigurationError(
+ "Missing [library:cache].dir configuration"
+ )
if not os.path.isdir(cache_root):
L.critical("Cache root %s not found, exiting.", cache_root)
- raise SystemExit("Missing cache root")
+ raise exceptions.LibraryConfigurationError(
+ f"Cache root directory not found: {cache_root}"
+ )
# 2) Build the cache URI
self.cache_dir = os.path.join(cache_root, "@global", self.layer_hash)
if not os.path.isdir(self.cache_dir):
L.warning("Cache directory not found for %s: %s", uri, self.cache_dir)
cache_uri = "file://" + self.cache_dir.rstrip("/")
# Initialize the filesystem provider *only* against the cache
super().__init__(library, cache_uri, layer, set_ready=False)
# 3) Re-publish readiness when new snapshots arrive
library.App.PubSub.subscribe(
"library.cache.ready",
self._on_cache_ready
)
🤖 Prompt for AI Agents
In asab/library/providers/cache.py from lines 34 to 63, replace the generic
Exception raised when the cache root configuration is missing with a specific
custom exception defined for the library to improve clarity and error handling.
Also, instead of raising SystemExit when the cache root directory is not found,
raise a library-specific exception to avoid terminating the entire application
abruptly. Define or use existing custom exceptions in the library for these
error cases and raise them accordingly in the constructor.

Comment on lines +68 to +72
async def read(self, path):
if not os.path.isdir(self.cache_dir):
# ← cache folder doesn’t exist → fall back
return await self._real.read(path)
return await super().read(path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add exception handling for cache read failures.

The fallback logic for missing cache directory is good, but consider handling cases where the cache directory exists but read operations fail (corruption, permissions, etc.).

Apply this diff to add proper exception handling:

 async def read(self, path):
     if not os.path.isdir(self.cache_dir):
         # ← cache folder doesn't exist → fall back
         return await self._real.read(path)
-    return await super().read(path)
+    try:
+        return await super().read(path)
+    except (FileNotFoundError, PermissionError, OSError) as e:
+        L.warning("Cache read failed for %s: %s, falling back to real provider", path, e)
+        return await self._real.read(path)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def read(self, path):
if not os.path.isdir(self.cache_dir):
# ← cache folder doesn’t exist → fall back
return await self._real.read(path)
return await super().read(path)
async def read(self, path):
if not os.path.isdir(self.cache_dir):
# ← cache folder doesn’t exist → fall back
return await self._real.read(path)
try:
return await super().read(path)
except (FileNotFoundError, PermissionError, OSError) as e:
L.warning("Cache read failed for %s: %s, falling back to real provider", path, e)
return await self._real.read(path)
🤖 Prompt for AI Agents
In asab/library/providers/cache.py around lines 68 to 72, the read method lacks
exception handling for cases where the cache directory exists but read
operations fail due to issues like corruption or permissions. Wrap the call to
super().read(path) in a try-except block to catch exceptions, and in the except
block, fall back to returning await self._real.read(path) to ensure robustness.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
asab/library/providers/cache.py (1)

25-50: Address exception handling issues flagged in previous reviews.

The constructor still uses generic RuntimeError and aggressive SystemExit despite previous review feedback. These issues should be resolved:

  1. Line 34: Use a library-specific exception instead of generic RuntimeError
  2. Line 37: SystemExit in a library component can terminate the entire application - use a library-specific exception instead

Apply this diff to fix the exception handling:

+from asab.library import exceptions
+
 def __init__(self, library, uri, layer, real_provider):
     # Keep for fallback
     self._real = real_provider
     # Compute layer‐key
     self.layer_hash = hashlib.sha256(uri.encode("utf-8")).hexdigest()

     # Resolve cache root
     cache_root = Config.get("library:cache", "dir", fallback=None)
     if not cache_root:
-        raise RuntimeError("Missing [library:cache].dir configuration")
+        raise exceptions.LibraryConfigurationError("Missing [library:cache].dir configuration")
     if not os.path.isdir(cache_root):
         L.critical("Cache root '{0}' not found, exiting.".format(cache_root))
-        raise SystemExit(1)
+        raise exceptions.LibraryConfigurationError(f"Cache root directory not found: {cache_root}")
🧹 Nitpick comments (1)
asab/library/providers/cache.py (1)

58-77: Excellent integration pattern that addresses PR concerns.

These override methods clearly demonstrate how the cache provider integrates with and wraps around the existing provider system, directly addressing the integration concerns raised in the PR objectives. The consistent fallback pattern is well-implemented.

Consider adding error handling for cache read failures to improve robustness:

 async def read(self, path):
     if self._cache_live():
-        return await super().read(path)
+        try:
+            return await super().read(path)
+        except (FileNotFoundError, PermissionError, OSError) as e:
+            L.warning("Cache read failed for %s: %s, falling back to real provider", path, e)
+            return await self._real.read(path)
     return await self._real.read(path)

Apply similar error handling to list, find, and subscribe methods for consistency.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc368ce and 5554df3.

📒 Files selected for processing (1)
  • asab/library/providers/cache.py (1 hunks)
🔇 Additional comments (4)
asab/library/providers/cache.py (4)

1-13: LGTM!

The imports are well-organized and appropriate for the functionality. The logging setup follows standard conventions.


15-24: LGTM!

The class design properly implements the wrapper pattern around the existing provider system. The docstring clearly explains the fallback behavior, addressing integration concerns.


51-52: LGTM!

Clean implementation of the event-driven readiness pattern that integrates well with the existing PubSub system.


54-56: LGTM!

Good design pattern to centralize the cache availability check. This makes the fallback logic consistent and maintainable.

@mithunbharadwaj mithunbharadwaj requested a review from ateska August 5, 2025 12:54
Copy link
Copy Markdown
Contributor

@ateska ateska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.CachedLibraries design concept is incorrect, rework is neede.


super().__init__(app, service_name)
self.Libraries: list[LibraryProviderABC] = []
self.CacheLibraries: list[LibraryProviderABC] = [] # cache-first
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.Libraries: list[LibraryProviderABC] = [] is enough - you don't need self.CacheLibraries[] array ... it will introduce inconsistencies in layers.

library_provider = None
# Handle libsreg+ URIs (which may be comma-separated)
if path.startswith('libsreg+'):
from .providers.cache import CacheLibraryProvider
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from .providers.libsreg import LibsRegLibraryProvider
realp = LibsRegLibraryProvider(self, path, layer)
cachep = CacheLibraryProvider(self, path, layer, realp)
self.Libraries.append(realp)

Copy link
Copy Markdown
Collaborator Author

@mithunbharadwaj mithunbharadwaj Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ateska We don't we append cachep to self.Libraries?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,self.Libraries.append(cachep)

"""
_validate_path_item(path)

# 1) cache‐first phase
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is unnecessary.

return itemio

return None
# 2) cache-first
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtto

for library in self.Libraries:
itemio = await library.read(path)

# 2) cache-first
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtto

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
asab/library/service.py (1)

611-611: Fix typo in docstring.

There's a typo in the path parameter documentation.

-		path (str): The absolute pƒath of the file to retrieve metadata for.
+		path (str): The absolute path of the file to retrieve metadata for.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e7d311 and cd662e8.

📒 Files selected for processing (1)
  • asab/library/service.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
asab/library/service.py (5)
asab/library/providers/cache.py (1)
  • CacheLibraryProvider (11-96)
asab/library/providers/libsreg.py (1)
  • LibsRegLibraryProvider (26-274)
asab/library/providers/zookeeper.py (1)
  • ZooKeeperLibraryProvider (24-543)
asab/library/providers/filesystem.py (1)
  • FileSystemLibraryProvider (26-243)
asab/library/providers/git.py (1)
  • GitLibraryProvider (24-233)
🔇 Additional comments (3)
asab/library/service.py (3)

117-126: Cache provider integration addresses previous concerns.

The new libsreg+ implementation correctly registers both providers in self.Libraries and uses early return to avoid duplicate registration. This addresses the commenter's question about integration - the cache provider acts as a read-through cache layer that falls back to the real LibsRegLibraryProvider when content isn't available in cache.


128-158: Consistent provider registration pattern implemented.

All provider types now follow the same pattern: create provider instance, append to self.Libraries, with proper handling of blank/comment lines. The error message improvement and SystemExit(1) provide clearer feedback for configuration issues.


354-358: Layer propagation verified across providers

Both CacheLibraryProvider and LibsRegLibraryProvider accept the same layer parameter in their constructors and pass it unchanged to FileSystemLibraryProvider (and ultimately to LibraryProviderABC), which sets self.Layer = layer. As a result, provider.Layer is consistent across cache and real (libsreg+) providers. No further action is required.

Mithun Shivashankar added 2 commits August 12, 2025 08:53
@ateska ateska assigned ateska and unassigned ateska Aug 20, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
asab/library/providers/cache.py (1)

1-97: Cache provider ordering is correct — but cache readiness can block Library startup

  • Verified: asab/library/service.py::_create_library registers CacheLibraryProvider before the real provider (cache becomes Libraries[0]).
  • Problem: asab/library/service.py calls disabled_file = await self.Libraries[0].read('/.disabled.yaml') without handling KeyError — if Libraries[0] is the cache wrapper and no snapshot exists, CacheLibraryProvider.read() raises KeyError and _read_disabled can fail at startup.
  • Problem: asab/library/providers/cache.py currently calls library.App.TaskService.schedule(self._set_ready()) unconditionally in init, marking the cache IsReady even when the snapshot is missing.
  • Fixes (required):
    • In asab/library/service.py::_read_disabled, catch KeyError from Libraries[0].read(...) and treat it as "no disabled file" (empty / None) so a missing cache does not break startup.
    • In asab/library/providers/cache.py, do not mark the cache ready unconditionally — either remove the unconditional schedule(self._set_ready()) or guard it with if self._cache_live(): and rely on the "library.cache.ready" event to set readiness when a snapshot becomes available.
♻️ Duplicate comments (3)
asab/library/providers/cache.py (3)

5-7: Use library-specific exceptions; avoid generic RuntimeError.

Also add missing import.

 from asab.config import Config
 from asab.library.providers.filesystem import FileSystemLibraryProvider
+from asab.library import exceptions

25-28: Configuration error should raise a library exception, not RuntimeError.

-		if not cache_root:
-			raise RuntimeError("Missing [library:cache].dir configuration")
+		if not cache_root:
+			raise exceptions.LibraryConfigurationError("Missing [library:cache].dir configuration")

54-57: Don’t terminate the app from a library; raise a configuration error instead.

-		# 5) Sanity‐check cache_root exists
+		# 5) Sanity-check cache_root exists
 		if not os.path.isdir(cache_root):
 			L.critical("Cache root '{}' not found, exiting.".format(cache_root))
-			raise SystemExit(1)
+			raise exceptions.LibraryConfigurationError(f"Cache root directory not found: {cache_root}")
🧹 Nitpick comments (3)
asab/library/providers/cache.py (3)

72-74: Fix callback signature; satisfy Ruff ARG002 and PubSub API.

PubSub passes message_type first; keep args for future use.

-	async def _on_cache_ready(self, *args):
+	async def _on_cache_ready(self, message_type, *_args, **_kwargs):
 		await self._set_ready()

60-64: Optional: include struct_data in logs for better observability.

Add uri/hash to ease correlation.

-			L.warning(
-				"No cache snapshot for URI '{}' at '{}'.".format(uri, self.cache_dir)
-			)
+			L.warning("No cache snapshot.", struct_data={'uri': uri, 'cache_dir': self.cache_dir, 'layer_hash': self.layer_hash})

66-69: Integration clarity (addresses reviewer’s question).

Please add a brief comment explaining that this provider is inserted before the live provider (e.g., libsreg+) so misses fall through without exceptions.

-		# 7) Delegate to FileSystemLibraryProvider
+		# 7) Delegate to FileSystemLibraryProvider
+		# Note: LibraryService should register this provider before the live provider.
+		# When the cache is absent, methods return None/[] so the next provider is tried.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb5f1d9 and 5012fbc.

📒 Files selected for processing (1)
  • asab/library/providers/cache.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asab/library/providers/cache.py (4)
asab/library/providers/filesystem.py (1)
  • FileSystemLibraryProvider (26-243)
asab/task.py (1)
  • TaskService (14-167)
asab/library/service.py (4)
  • _set_ready (175-187)
  • subscribe (715-759)
  • read (215-251)
  • find (193-213)
asab/pubsub.py (1)
  • PubSub (11-243)
🪛 Ruff (0.12.2)
asab/library/providers/cache.py

27-27: Avoid specifying long messages outside the exception class

(TRY003)


54-54: Comment contains ambiguous (HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF003)


72-72: Unused method argument: args

(ARG002)

Comment on lines +66 to +70
cache_uri = "file://{}".format(self.cache_dir.rstrip("/"))
super().__init__(library, cache_uri, layer, set_ready=False)
library.App.TaskService.schedule(self._set_ready())
library.App.PubSub.subscribe("library.cache.ready", self._on_cache_ready)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Event name/style likely inconsistent; verify published topic.

Project convention uses “!” (e.g., "Library.ready!"). Please confirm actual published name/case before merging.


🏁 Script executed:

#!/bin/bash
# Verify which cache-ready event is published
rg -nC2 -P "PubSub\.publish\((?:\"|')([A-Za-z\.\-_]+!?)(?:\"|')" -g '!**/node_modules/**' -S
rg -n "cache.ready" -S

Length of output: 335


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) PubSub.publish / PubSub.subscribe lines with string literal:"
rg -n -S -C1 "PubSub\.(publish|subscribe)\s*\(\s*['\"][^'\"]+['\"]" || true

echo
echo "2) Any .publish/.subscribe with string literal (broader):"
rg -n -S -C1 "\.(publish|subscribe)\s*\(\s*['\"][^'\"]+['\"]" || true

echo
echo "3) String literals containing an exclamation mark:"
rg -n -S -C1 "['\"][^'\"]*![^'\"]*['\"]" || true

echo
echo "4) String literals starting with library. or Library.:"
rg -n -S -C1 "['\"](library|Library)\.[^'\"]*['\"]" || true

echo
echo "5) Occurrences of 'cache.ready' (with or without '!'):"
rg -n -S -C1 "cache\.ready!?|cache\.ready" || true

Length of output: 42741


Fix PubSub topic — name/casing inconsistent and no matching publisher found

asab/library/providers/cache.py (lines ~66–70) subscribes to "library.cache.ready" (lowercase, no "!"); repository convention is CamelCase + "!" (e.g., "Library.ready!") and there is no publisher for "library.cache.ready". Rename the topic to "Library.cache.ready!" and update/add the publisher to emit that exact topic, or change the publisher to emit "library.cache.ready" to match the current subscribe.

🤖 Prompt for AI Agents
In asab/library/providers/cache.py around lines 66 to 70 the PubSub topic
subscribed to ("library.cache.ready") does not follow the repository convention
(CamelCase + "!" ) and no publisher emits that exact topic; change the subscribe
call to use "Library.cache.ready!" to match convention, and locate the code that
currently publishes the ready event (or any publisher that should signal cache
readiness) and update it to emit "Library.cache.ready!" too so both publisher
and subscriber use the identical topic string; ensure you preserve existing
casing and the trailing exclamation mark and run tests to verify the event flow.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
asab/library/providers/cache.py (4)

12-17: ⚠️ Potential issue | 🟡 Minor

Docstring still documents KeyError fallback — update once I/O methods are fixed.

Line 17 states callers should expect a KeyError. The intended contract (per prior review) is raise-free fallbacks (None/[]/no-op). This docstring should be updated in tandem with the I/O-method fix noted below.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@asab/library/providers/cache.py` around lines 12 - 17, Update the module
docstring to remove the statement that callers should expect a KeyError and
instead document the new raise-free contract for the cache wrapper:
read()/find() return None when missing, list() returns an empty list, and
subscribe() is a no-op (or returns an empty/unsubscribed iterator) to trigger
fallback without exceptions; refer to the read(), list(), find(), and
subscribe() methods in the asab.library.providers.cache module when making the
wording change so the docstring matches the implemented behavior.

49-50: ⚠️ Potential issue | 🟠 Major

PubSub topic names are lowercase — verify against project naming convention and confirm a publisher exists.

The subscribed topics "library.cache.ready!" and "library.cache.not_ready!" are lowercase. Prior codebase analysis found the convention is CamelCase + "!" (e.g., "Library.ready!"), and no publisher for these exact topic strings was found in the repository. If no publisher emits these topics, the event handlers _on_cache_ready and _on_cache_not_ready will never fire and the provider will never transition to the ready state.

#!/bin/bash
# Verify publisher/subscriber parity and naming convention for library.cache events

echo "=== Subscriptions to library.cache topics ==="
rg -n -S "library\.cache\.(ready|not_ready)" --type py -C2

echo ""
echo "=== PubSub.publish calls — check casing convention ==="
rg -n -S "PubSub\.publish\s*\(" --type py -C1

echo ""
echo "=== Existing Library.* ready event names for convention reference ==="
rg -n -S "Library\.[A-Za-z]+\.?ready" --type py -C1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@asab/library/providers/cache.py` around lines 49 - 50, The subscription uses
lowercase topic strings ("library.cache.ready!" and "library.cache.not_ready!")
that likely don't match the project's CamelCase PubSub convention and therefore
never receive events; update the topic names in the subscribes inside the
provider (the two calls to library.App.PubSub.subscribe) to match the canonical
publisher topic (or add/adjust the publisher) so _on_cache_ready and
_on_cache_not_ready can fire—for example, change the subscribed topic names to
the project's CamelCase event names used elsewhere (or create a PubSub.publish
that emits the exact lowercase names if the convention allows); ensure the
strings you use exactly match the emitter so the provider transitions to ready.

68-86: ⚠️ Potential issue | 🔴 Critical

KeyError from cache-miss breaks all provider I/O — return safe fallbacks instead.

If LibraryService iterates providers without a surrounding try/except, a KeyError here (e.g., on a cold start before any cache snapshot exists) aborts iteration and the real LibsRegLibraryProvider is never reached. Every read/list/find/subscribe call for a libsreg+ URI would fail hard rather than falling through to the live provider.

The correct contract for a cache-miss is a raise-free fallback so the service layer can continue to the next provider:

🔧 Proposed fix
     async def read(self, path):
         if not self._cache_live():
-            raise KeyError("No cache for '{}'".format(path))
+            return None
         return await super().read(path)

     async def list(self, path):
         if not self._cache_live():
-            raise KeyError("No cache for '{}'".format(path))
+            return []
         return await super().list(path)

     async def find(self, path):
         if not self._cache_live():
-            raise KeyError("No cache for '{}'".format(path))
+            return []
         return await super().find(path)

     async def subscribe(self, path, target=None):
         if not self._cache_live():
-            raise KeyError("No cache for '{}'".format(path))
-        return await super().subscribe(path, target)
+            return
+        return await super().subscribe(path, target)

Confirm that LibraryService does not wrap provider I/O calls in a try/except KeyError before finalising the fix strategy:

#!/bin/bash
# Check whether LibraryService catches KeyError around provider method calls
rg -n -S "KeyError" --type py -C3

echo ""
# Inspect provider iteration in LibraryService read/list/find/subscribe
ast-grep --pattern 'async def read($$$) {
  $$$
}'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@asab/library/providers/cache.py` around lines 68 - 86, The cache provider
methods read, list, find and subscribe currently raise KeyError on a cache-miss
which aborts provider iteration; change each method to check _cache_live() and
return a non-exception fallback instead of raising: return None for
read/find/subscribe and return an empty list ([]) for list when not
_cache_live(), otherwise await and return super().*(); keep method signatures
unchanged and remove the KeyError raises so the LibraryService can fall through
to the live provider (refer to the read, list, find, subscribe methods in this
file).

27-27: ⚠️ Potential issue | 🟠 Major

Use a library-specific exception; avoid SystemExit in library code.

Two related constructor issues remain unresolved from prior review:

  1. Line 27RuntimeError is too generic and violates Ruff TRY003. Use a domain-specific exception (e.g., LibraryConfigurationError).
  2. Lines 35–37raise SystemExit(1) hard-kills the entire application process from inside a library component. Any caller that wraps provider construction (e.g., in a try/except) would have no chance to recover or log context. Raise a domain-specific exception instead so the decision to exit stays with the application layer.
🔧 Proposed fix
+from asab.library import exceptions

     cache_root = Config.get("library:cache", "dir", fallback=None)
     if not cache_root:
-        raise RuntimeError("Missing [library:cache].dir configuration")
+        raise exceptions.LibraryConfigurationError(
+            "[library:cache].dir is not configured"
+        )
     ...
     if not os.path.isdir(cache_root):
-        L.critical("Cache root '{}' not found, exiting.".format(cache_root))
-        raise SystemExit(1)
+        raise exceptions.LibraryConfigurationError(
+            "Cache root directory not found: {}".format(cache_root)
+        )

Also applies to: 35-37

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@asab/library/providers/cache.py` at line 27, Replace the generic RuntimeError
and any SystemExit usages in the cache provider constructor with a
library-specific exception: define or import LibraryConfigurationError and raise
that instead of RuntimeError("Missing [library:cache].dir configuration") and
raise LibraryConfigurationError with contextual messages instead of raise
SystemExit(1) in the constructor (look for the cache provider class/constructor,
e.g., CacheProvider.__init__ or similar); ensure the new exception is
documented/imported at top of providers/cache.py so callers can catch it without
the process being killed.
🧹 Nitpick comments (2)
asab/library/providers/cache.py (2)

63-66: _cache_live can be simplified — os.path.isdir already follows symlinks.

The islink + realpath branch is functionally identical to a plain os.path.isdir call because isdir follows symlink chains (including broken ones, which return False). The two-branch logic adds complexity with no benefit.

♻️ Proposed simplification
-    def _cache_live(self):
-        if os.path.islink(self.cache_dir):
-            return os.path.isdir(os.path.realpath(self.cache_dir))
-        return os.path.isdir(self.cache_dir)
+    def _cache_live(self):
+        return os.path.isdir(self.cache_dir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@asab/library/providers/cache.py` around lines 63 - 66, The _cache_live method
has redundant branching because os.path.isdir already follows symlinks; simplify
_cache_live (referencing method _cache_live and attribute self.cache_dir) by
removing the islink()/realpath() branch and directly return
os.path.isdir(self.cache_dir) so behavior is unchanged but code is clearer.

53-61: Replace unused *args with *_ to satisfy linter (Ruff ARG002).

Both _on_cache_ready and _on_cache_not_ready declare *args but never reference it.

♻️ Proposed fix
-    async def _on_cache_ready(self, *args):
+    async def _on_cache_ready(self, *_):
         live = self._cache_live()
         if live and not self.IsReady:
             await self._set_ready(True)

-    async def _on_cache_not_ready(self, *args):
+    async def _on_cache_not_ready(self, *_):
         live = self._cache_live()
         if (not live) and self.IsReady:
             await self._set_ready(False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@asab/library/providers/cache.py` around lines 53 - 61, The linter flags
unused varargs in the handlers; update the signatures of _on_cache_ready and
_on_cache_not_ready to accept unused varargs as "*_" instead of "*args" (i.e.,
change def async _on_cache_ready(self, *args) and async
_on_cache_not_ready(self, *args) to use "*_") so the functions' behavior remains
identical but Ruff ARG002 is satisfied; keep the internal logic (_cache_live(),
IsReady checks, and _set_ready calls) unchanged.
ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5012fbc and fd63bd0.

📒 Files selected for processing (1)
  • asab/library/providers/cache.py
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@asab/library/providers/cache.py`:
- Around line 12-17: Update the module docstring to remove the statement that
callers should expect a KeyError and instead document the new raise-free
contract for the cache wrapper: read()/find() return None when missing, list()
returns an empty list, and subscribe() is a no-op (or returns an
empty/unsubscribed iterator) to trigger fallback without exceptions; refer to
the read(), list(), find(), and subscribe() methods in the
asab.library.providers.cache module when making the wording change so the
docstring matches the implemented behavior.
- Around line 49-50: The subscription uses lowercase topic strings
("library.cache.ready!" and "library.cache.not_ready!") that likely don't match
the project's CamelCase PubSub convention and therefore never receive events;
update the topic names in the subscribes inside the provider (the two calls to
library.App.PubSub.subscribe) to match the canonical publisher topic (or
add/adjust the publisher) so _on_cache_ready and _on_cache_not_ready can
fire—for example, change the subscribed topic names to the project's CamelCase
event names used elsewhere (or create a PubSub.publish that emits the exact
lowercase names if the convention allows); ensure the strings you use exactly
match the emitter so the provider transitions to ready.
- Around line 68-86: The cache provider methods read, list, find and subscribe
currently raise KeyError on a cache-miss which aborts provider iteration; change
each method to check _cache_live() and return a non-exception fallback instead
of raising: return None for read/find/subscribe and return an empty list ([])
for list when not _cache_live(), otherwise await and return super().*(); keep
method signatures unchanged and remove the KeyError raises so the LibraryService
can fall through to the live provider (refer to the read, list, find, subscribe
methods in this file).
- Line 27: Replace the generic RuntimeError and any SystemExit usages in the
cache provider constructor with a library-specific exception: define or import
LibraryConfigurationError and raise that instead of RuntimeError("Missing
[library:cache].dir configuration") and raise LibraryConfigurationError with
contextual messages instead of raise SystemExit(1) in the constructor (look for
the cache provider class/constructor, e.g., CacheProvider.__init__ or similar);
ensure the new exception is documented/imported at top of providers/cache.py so
callers can catch it without the process being killed.

---

Nitpick comments:
In `@asab/library/providers/cache.py`:
- Around line 63-66: The _cache_live method has redundant branching because
os.path.isdir already follows symlinks; simplify _cache_live (referencing method
_cache_live and attribute self.cache_dir) by removing the islink()/realpath()
branch and directly return os.path.isdir(self.cache_dir) so behavior is
unchanged but code is clearer.
- Around line 53-61: The linter flags unused varargs in the handlers; update the
signatures of _on_cache_ready and _on_cache_not_ready to accept unused varargs
as "*_" instead of "*args" (i.e., change def async _on_cache_ready(self, *args)
and async _on_cache_not_ready(self, *args) to use "*_") so the functions'
behavior remains identical but Ruff ARG002 is satisfied; keep the internal logic
(_cache_live(), IsReady checks, and _set_ready calls) unchanged.

# always register the cache wrapper (even if no snapshot yet)
self.Libraries.append(cachep)
from .providers.libsreg import LibsRegLibraryProvider
realp = LibsRegLibraryProvider(self, path, layer)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like having two layers here:

  1. in case of some race condition or mistake, when the two layers are not the same, this might bring weird mistakes
  2. we want to cut off number of http requests from the whole installation and this just adds cache provider on the top of the libsreg provider - no cut in the network traffic

We agreed with Mithun to start with a simple check in the init time:
->If cache is not present (can be chceked in init time), start and use libsreg provider.

There can be scenarios that break that:

  • cache is gonna be ready 30s after init time and service never switches to cache.
  • cache can disappear in the runtime and service doesn't fallback to libsreg.

Let's start with something, make sure it survives an upgrade procedure and only after then solve these scenarios/conditions?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the upgrade,
new version of library will be downloaded to the cache as soon as possible. However, the services will be still running, requiring the old library, until they are upgraded.
We need to have a secure, well designed solution for this race condition

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the right approach is to use init only for initial state, then keep liveness checks and readiness updates at runtime, so we can fall back to libsreg when cache is unavailable and automatically use cache again when it becomes live.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants