Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds scoped reading (global, tenant, personal) to ZooKeeper and filesystem providers and refactors personal-scope discovery; reworks LibraryService export into a layered flow exporting global → tenant → per-credential personal items with per-credential auth and scope-aware tar packaging. Changes
Sequence Diagram(s)sequenceDiagram
participant LS as LibraryService
participant Prov as Provider (ZK/FS)
participant Auth as AuthzContext
participant Tenant as TenantContext
participant Tar as TarBuilder
LS->>LS: determine scopes (global, tenant, personal)
loop For each scope
LS->>Prov: _collect_export_items(path, tenant_id?, cred_id?)
Prov-->>LS: list of items
loop For each item
LS->>Auth: _export_authz_context(cred_id)
Auth-->>LS: authz active
LS->>Tenant: tenant_context(tenant_id)
Tenant-->>LS: tenant context active
LS->>Prov: _read_item_for_scope(name, scope, tenant_id, cred_id)
Prov-->>LS: stream or None
LS->>Tar: _add_stream_to_tar(tar_name, stream)
Tar-->>LS: added
LS->>Tenant: exit tenant context
LS->>Auth: exit authz context
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@asab/library/service.py`:
- Around line 943-972: The bare except in _collect_personal_credential_ids
silently swallows real provider errors; change the generic except Exception:
continue to catch the exception as a variable and log it before continuing.
Specifically, in the loop over self.Libraries where you call getattr(provider,
"_get_personal_scopes", None) and await get_scopes(...), replace the final
except Exception handler with "except Exception as e:" and call the service
logger (e.g., self.log.exception or self.logger.exception) including context
like provider.__class__.__name__ and tenant_id, then continue so failures are
recorded but other providers still run.
🧹 Nitpick comments (2)
asab/library/providers/zookeeper.py (1)
336-369:read_scopedsilently returnsNonefor unrecognized scope values.Line 359-360: An invalid
scopeargument (e.g., a typo like"globl") silently returnsNone, which is indistinguishable from "data not found." Consider raising aValueErrorinstead so callers discover misuse early.Proposed fix
else: - return None + raise ValueError("Unknown scope {!r}; expected 'global', 'tenant', or 'personal'".format(scope))asab/library/service.py (1)
1025-1029:assertin production code may be stripped by-Oflag.Line 1027:
assert item_name.startswith(base_path)will silently disappear when Python runs with optimizations enabled (python -O). If this invariant matters, use an explicitif/raise.Proposed fix
def _build_export_tar_name(self, item_name: str, base_path: str, remove_path: bool) -> str: if remove_path: - assert item_name.startswith(base_path) + if not item_name.startswith(base_path): + raise ValueError("Item name {!r} does not start with base path {!r}".format(item_name, base_path)) return item_name[len(base_path):] return item_name
| async def _read_item_for_scope( | ||
| self, | ||
| item_name: str, | ||
| scope: str, | ||
| tenant_id: typing.Optional[str], | ||
| cred_id: typing.Optional[str], | ||
| ) -> typing.Optional[typing.IO]: | ||
| with tenant_context(tenant_id): | ||
| with self._export_authz_context(cred_id): | ||
| for provider in self.Libraries: | ||
| read_scoped = getattr(provider, "read_scoped", None) | ||
|
|
||
| if scope == "global": | ||
| if read_scoped is not None: | ||
| data = await read_scoped(item_name, "global") | ||
| else: | ||
| data = await provider.read(item_name) | ||
| elif scope in {"tenant", "personal"}: | ||
| if read_scoped is None: | ||
| continue | ||
| data = await read_scoped(item_name, scope) | ||
| else: | ||
| continue | ||
|
|
||
| if data is not None: | ||
| return data | ||
|
|
||
| return None |
There was a problem hiding this comment.
For global scope, provider.read(item_name) may return a tenant/personal override instead of the global version.
Line 990–991: When read_scoped is not available on a provider, the fallback uses provider.read(item_name). The existing read() method in ZooKeeperLibraryProvider checks personal → tenant → global in precedence order. If a tenant or personal context is accidentally set (even though tenant_context(None) should clear it), this fallback would return the wrong scope's data. More importantly, for non-ZooKeeper providers that lack read_scoped, the fallback always returns the highest-precedence match, not necessarily the global one.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
asab/library/providers/filesystem.py (1)
135-135:⚠️ Potential issue | 🟡 MinorInconsistent type hint style within the file.
Line 135 uses lowercase
tuple[str, str], while other type hints in this file (lines 284, 291) usetyping.Tuple. For consistency, usetyping.List[typing.Tuple[str, str]].Fix for consistency
- async def _get_personal_scopes(self) -> typing.List[tuple[str, str]]: + async def _get_personal_scopes(self) -> typing.List[typing.Tuple[str, str]]:
🤖 Fix all issues with AI agents
In `@asab/library/providers/filesystem.py`:
- Around line 280-298: The read_personal_scopes method opens file handles
(io.FileIO) and appends them to results but does not close already-opened
handles if a later iteration raises, which can leak file descriptors; update
read_personal_scopes to guard each iteration by either (a) resolving
personal_path and checking os.path.isfile before opening the file, or (b)
wrapping the loop body in try/except/finally so that on any exception you
iterate over results and close any opened file objects, then re-raise the
exception; reference read_personal_scopes, _get_personal_scopes, _personal_path,
results, and io.FileIO when making the change.
- Around line 261-268: The read_scoped("tenant") branch should not fall back to
global files when no tenant is set: before calling build_path(path,
tenant_specific=True) in the read_scoped function, check whether the Tenant
context/identifier is present (the same check used by build_path to decide
tenant_specific behavior) and if absent return None immediately; only call
build_path and attempt os.path.isfile/return io.FileIO when a tenant is actually
available so the "single scope only" contract is preserved.
- Line 278: The function that currently ends with "return None" for unrecognized
scope values should instead raise a ValueError (matching the behavior of
_resolve_fs_path_from_info); replace the silent return with raising
ValueError(f"Unknown scope: {scope}") (or similar) so callers immediately see
misuse, and ensure the error message includes the invalid scope value and
context to aid debugging.
🧹 Nitpick comments (1)
asab/library/providers/filesystem.py (1)
267-268: Avoid catching blindException.The broad
except Exceptioncan mask unexpected errors (e.g.,PermissionError,OSError). The existingread()has the same pattern (line 234), but for a new method it's worth tightening.build_pathcan raiseLookupError(fromTenant.get()) orAssertionError—if the tenant guard above is applied, the remaining likely exceptions are filesystem-related.Suggested narrower catch
- except Exception: + except (FileNotFoundError, IsADirectoryError, OSError): return None
| return io.FileIO(personal_path, "rb") | ||
| return None | ||
|
|
||
| return None |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Unknown scope silently returns None—consider raising ValueError.
If an invalid scope value is passed (e.g., a typo like "tenent"), the method silently returns None, making the bug invisible to callers. A ValueError would surface misuse immediately, consistent with how _resolve_fs_path_from_info (line 133) raises on unknown scopes.
Suggested change
- return None
+ raise ValueError("Unknown scope: {!r}".format(scope))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return None | |
| raise ValueError("Unknown scope: {!r}".format(scope)) |
🤖 Prompt for AI Agents
In `@asab/library/providers/filesystem.py` at line 278, The function that
currently ends with "return None" for unrecognized scope values should instead
raise a ValueError (matching the behavior of _resolve_fs_path_from_info);
replace the silent return with raising ValueError(f"Unknown scope: {scope}") (or
similar) so callers immediately see misuse, and ensure the error message
includes the invalid scope value and context to aid debugging.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
asab/library/service.py (2)
861-861:Tenant.get(None)is a clean alternative to the try/except pattern used elsewhere.Consider adopting this pattern consistently across the codebase (e.g.,
_current_tenant_idhelpers in providers usetry: Tenant.get() / except LookupError).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@asab/library/service.py` at line 861, Replace try/except LookupError patterns in tenant lookup helpers (e.g., the _current_tenant_id functions in provider modules) with the cleaner Tenant.get(None) call: call Tenant.get(None), check for a None return and handle that case explicitly instead of catching LookupError; update any callers or logic that expected an exception to instead handle a None tenant_id value (use Tenant.get(None) in place of try: Tenant.get() / except LookupError blocks and adapt conditional logic accordingly).
908-940:_collect_export_itemslists items from all scopes, relying onread_scopedto filter at read time.Since
provider.list()merges personal + tenant + global items regardless of the context, the same item set is collected in all three export phases. The scope-specific filtering only happens in_read_item_for_scopewhenread_scopedreturnsNonefor the wrong scope. This is functionally correct but does redundant work—the same directory tree is listed and walked three times. For now this is acceptable, but worth noting if export performance becomes a concern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@asab/library/service.py` around lines 908 - 940, _collect_export_items currently walks the entire merged namespace (using self._list with self.Libraries and item.providers), causing duplicate work across export scopes; update it to limit listings to the active scope instead: determine the proper provider set for the current export context (based on tenant_id/cred_id) and pass that provider list into the calls to self._list (both the initial self._list(path, providers=...) and the child_items = await self._list(item.name, providers=...)); you can add a small helper (e.g., _providers_for_export_scope) or inline the logic, and keep existing behavior of _read_item_for_scope/read_scoped unchanged. This reduces redundant directory walking while preserving per-scope filtering in _read_item_for_scope.asab/library/providers/filesystem.py (2)
283-309: File handle cleanup on failure is properly implemented; innerexceptduring cleanup should log.The
try/exceptguard around the loop correctly closes already-opened file handles if a later iteration fails — this addresses the earlier resource-leak concern. However, the innerexcept Exception: pass(lines 306–307) during cleanup silently discards close failures. While this is a common defensive pattern, logging atDEBUGlevel would help diagnose issues like broken file descriptors.Suggested improvement
except Exception: for _, _, file_handle in results: try: file_handle.close() except Exception: - pass + L.debug("Failed to close file handle during cleanup", exc_info=True) raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@asab/library/providers/filesystem.py` around lines 283 - 309, The cleanup inner except in read_personal_scopes silently swallows close errors; change it to log the exception at DEBUG so failures to close file handles are visible for diagnostics. In read_personal_scopes, replace the inner "except Exception: pass" that wraps file_handle.close() with a debug log call (e.g. self._logger.debug("Failed to close file handle for %s/%s", scope_tenant, scope_cred, exc_info=True) or similar) so any exceptions from file_handle.close() are recorded; reference read_personal_scopes, results, and file_handle to locate the code to modify.
261-271: Tenant scope guard looks good; consider narrowing the bareexcept Exception.The early
tenant_idcheck (line 263) correctly prevents the fallback-to-global bug. However, catching bareExceptionat line 270 masks unexpected errors (e.g.,PermissionError,OSErrorfrom a broken symlink). Consider catchingOSErrorinstead, which covers all filesystem-level failures without swallowing programming errors.Suggested fix
if scope == "tenant": tenant_id = self._current_tenant_id() if not tenant_id: return None try: tenant_path = self.build_path(path, tenant_specific=True, tenant=tenant_id) if os.path.isfile(tenant_path): return io.FileIO(tenant_path, "rb") - return None - except Exception: return None + except OSError: + return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@asab/library/providers/filesystem.py` around lines 261 - 271, The bare except in the tenant-scoped branch of the reader (inside the scope == "tenant" block) swallows non-filesystem errors; change it to only catch filesystem-related errors (e.g., replace the generic "except Exception" with "except OSError as e") around the call to build_path and os.path.isfile so that file-system failures are handled but programming errors still propagate; keep the tenant_id guard and return None on OSError as currently intended and consider logging e if desired.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@asab/library/service.py`:
- Around line 847-905: The file descriptor leak occurs because streams returned
by _read_item_for_scope (which may be io.FileIO) are not closed after being
added to the tar with _add_stream_to_tar; update all three export phases in the
export routine to ensure each item_data is closed after consumption (use
try/finally or context manager around the add step), i.e. after calling
_add_stream_to_tar(tarobj, tar_name, item_data) always call item_data.close()
(or use with semantics) for the global, tenant and personal loops that build
tar_name and call _add_stream_to_tar so filesystem FDs are released.
---
Duplicate comments:
In `@asab/library/providers/filesystem.py`:
- Line 281: The function branch that currently returns None for unknown scopes
(replace the stray "return None" at the end of the scope-resolution function)
should instead raise a ValueError so typos like "tenent" fail fast; update the
scope-resolution function (the same logic pattern used in
_resolve_fs_path_from_info) to raise ValueError("Unknown scope: <scope>") or
similar, ensuring callers get an explicit exception rather than a silent None.
---
Nitpick comments:
In `@asab/library/providers/filesystem.py`:
- Around line 283-309: The cleanup inner except in read_personal_scopes silently
swallows close errors; change it to log the exception at DEBUG so failures to
close file handles are visible for diagnostics. In read_personal_scopes, replace
the inner "except Exception: pass" that wraps file_handle.close() with a debug
log call (e.g. self._logger.debug("Failed to close file handle for %s/%s",
scope_tenant, scope_cred, exc_info=True) or similar) so any exceptions from
file_handle.close() are recorded; reference read_personal_scopes, results, and
file_handle to locate the code to modify.
- Around line 261-271: The bare except in the tenant-scoped branch of the reader
(inside the scope == "tenant" block) swallows non-filesystem errors; change it
to only catch filesystem-related errors (e.g., replace the generic "except
Exception" with "except OSError as e") around the call to build_path and
os.path.isfile so that file-system failures are handled but programming errors
still propagate; keep the tenant_id guard and return None on OSError as
currently intended and consider logging e if desired.
In `@asab/library/service.py`:
- Line 861: Replace try/except LookupError patterns in tenant lookup helpers
(e.g., the _current_tenant_id functions in provider modules) with the cleaner
Tenant.get(None) call: call Tenant.get(None), check for a None return and handle
that case explicitly instead of catching LookupError; update any callers or
logic that expected an exception to instead handle a None tenant_id value (use
Tenant.get(None) in place of try: Tenant.get() / except LookupError blocks and
adapt conditional logic accordingly).
- Around line 908-940: _collect_export_items currently walks the entire merged
namespace (using self._list with self.Libraries and item.providers), causing
duplicate work across export scopes; update it to limit listings to the active
scope instead: determine the proper provider set for the current export context
(based on tenant_id/cred_id) and pass that provider list into the calls to
self._list (both the initial self._list(path, providers=...) and the child_items
= await self._list(item.name, providers=...)); you can add a small helper (e.g.,
_providers_for_export_scope) or inline the logic, and keep existing behavior of
_read_item_for_scope/read_scoped unchanged. This reduces redundant directory
walking while preserving per-scope filtering in _read_item_for_scope.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
asab/library/service.py (1)
951-952: Usecollections.dequefor the BFS queue.
list.pop(0)is O(n) per call. For large directory trees, this adds up to O(n²).collections.deque.popleft()is O(1).Proposed fix
+import collections + ... - recitems = list(items[:]) + recitems = collections.deque(items[:]) ... while len(recitems) > 0: - item = recitems.pop(0) + item = recitems.popleft()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@asab/library/service.py` around lines 951 - 952, The BFS queue currently uses a Python list and repeatedly calls recitems.pop(0), which is O(n); change recitems to a collections.deque and replace pop(0) with recitems.popleft() for O(1) dequeuing. Add the import from collections (from collections import deque), initialize recitems with deque(...) where it’s created, and update the loop in the function/method that contains the while len(recitems) > 0 / item = recitems.pop(0) lines to use recitems.popleft() instead; keep all other logic (appending new nodes) the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@asab/library/service.py`:
- Around line 1061-1065: Replace the runtime-conditional assert in
_build_export_tar_name with an explicit validation: check that
item_name.startswith(base_path) and raise a clear exception (e.g., ValueError)
if not, so the function fails deterministically even under python -O; keep the
existing return behavior (strip base_path when remove_path is True, else return
item_name).
---
Duplicate comments:
In `@asab/library/service.py`:
- Around line 1023-1027: The fallback path for scope == "global" can call
provider.read(item_name) which may return non-global data because some providers
ignore tenant_context; update the logic in the export/read loop (the branch that
checks scope == "global") to skip providers that do not implement read_scoped
instead of calling provider.read, or alternatively guard the fallback by
verifying the provider supports tenant_context-aware reads; specifically, change
the handling around the read_scoped lookup for providers (the read_scoped
variable and provider.read call) so that if read_scoped is None you do not call
provider.read(item_name) for global-scoped exports but skip that provider
(matching the tenant/personal branches), or add a clear check (e.g., provider
has attribute/method read_scoped) before falling back to provider.read while
ensuring tenant_context(None) is respected.
---
Nitpick comments:
In `@asab/library/service.py`:
- Around line 951-952: The BFS queue currently uses a Python list and repeatedly
calls recitems.pop(0), which is O(n); change recitems to a collections.deque and
replace pop(0) with recitems.popleft() for O(1) dequeuing. Add the import from
collections (from collections import deque), initialize recitems with deque(...)
where it’s created, and update the loop in the function/method that contains the
while len(recitems) > 0 / item = recitems.pop(0) lines to use recitems.popleft()
instead; keep all other logic (appending new nodes) the same.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
asab/library/service.py (2)
849-931: Consider extracting the repeated export-phase logic into a helper.The three export phases (global, tenant, personal) each duplicate the same read → dedup → close → add-to-tar → close pattern (~20 lines each). A small helper like
_export_items_to_tar(tarobj, items, scope, tenant_id, cred_id, tar_prefix, ...)would eliminate this repetition and reduce the surface for future divergence bugs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@asab/library/service.py` around lines 849 - 931, The three export phases duplicate the same read→dedup→add→close logic; refactor by adding a helper method (e.g. _export_items_to_tar) that takes tarobj, items, scope, tenant_id, cred_id, tar_prefix (and added_tar_names) and implements: loop items, call _read_item_for_scope(name, scope, tenant_id, cred_id), continue if None, build the tar_name using _build_export_tar_name plus tar_prefix when needed, skip if in added_tar_names, call _add_stream_to_tar inside try/finally ensuring item_data.close is always attempted, then add tar_name to added_tar_names; replace the three blocks that call _collect_export_items/_collect_personal_credential_ids and Tenant.get with calls to this helper.
951-952:list.pop(0)is O(n); usecollections.dequefor the BFS queue.
recitems.pop(0)shifts all remaining elements on every iteration, making the traversal O(n²) in the number of directories. Switching tocollections.dequeandpopleft()makes it O(n). The same pattern applies to the existinglistmethod at line 331.Proposed fix
+from collections import deque ... - recitems = list(items[:]) + recitems = deque(items[:]) ... while len(recitems) > 0: - item = recitems.pop(0) + item = recitems.popleft()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@asab/library/service.py` around lines 951 - 952, The BFS loop using recitems.pop(0) (variable recitems in the traversal code) makes the traversal O(n²); change recitems from a plain list to a collections.deque and replace recitems.pop(0) with recitems.popleft() (also add from collections import deque at the top). Apply the same change to the other occurrence in the list method (the list function referenced in the review) so both breadth-first queues use deque/popleft for O(n) behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@asab/library/service.py`:
- Around line 1023-1027: The fallback path currently calls
provider.read(item_name) which uses standard precedence and may return
non-global data; change the fallback to explicitly request global-only data by
calling provider.read(item_name, "global") (or the provider's equivalent
signature) when read_scoped is None so that within the scope == "global" branch
you always retrieve purely global content; update the code using the same
symbols shown (scope, read_scoped, provider.read, item_name).
---
Nitpick comments:
In `@asab/library/service.py`:
- Around line 849-931: The three export phases duplicate the same
read→dedup→add→close logic; refactor by adding a helper method (e.g.
_export_items_to_tar) that takes tarobj, items, scope, tenant_id, cred_id,
tar_prefix (and added_tar_names) and implements: loop items, call
_read_item_for_scope(name, scope, tenant_id, cred_id), continue if None, build
the tar_name using _build_export_tar_name plus tar_prefix when needed, skip if
in added_tar_names, call _add_stream_to_tar inside try/finally ensuring
item_data.close is always attempted, then add tar_name to added_tar_names;
replace the three blocks that call
_collect_export_items/_collect_personal_credential_ids and Tenant.get with calls
to this helper.
- Around line 951-952: The BFS loop using recitems.pop(0) (variable recitems in
the traversal code) makes the traversal O(n²); change recitems from a plain list
to a collections.deque and replace recitems.pop(0) with recitems.popleft() (also
add from collections import deque at the top). Apply the same change to the
other occurrence in the list method (the list function referenced in the review)
so both breadth-first queues use deque/popleft for O(n) behavior.
|
@mithunbharadwaj - I need to see this in action at review please |
Summary by CodeRabbit
New Features
Bug Fixes / Reliability
Behavior