Skip to content

Conversation

@rfrenchseti
Copy link
Collaborator

@rfrenchseti rfrenchseti commented Jan 6, 2026

Fixed Issues

Summary of Changes

  • Changed the way metadata caching and time sensitivity work. Fixes Clean up handling of modification time #57.
    • When a cache has time_sensitive=True and retrieve() downloads a file, it will set the mtime of the local file to equal the mtime of the remote file. If time_sensitive=False, then the local file will get the mtime of when it was retrieved.
    • When a cache has time_sensitive=True, upload() will set the mtime of the remote file equal to the mtime of the local file. If time_sensitive=False, then the remote file will get the mtime of when it was retrieved. The mtime of the local file is no longer modified by upload().
    • modification_time has a new argument bypass_cache, which operates similarly to exists(bypass_cache). In this case, the cache being bypassed isn't the local file cache, but is the internal metadata cache. bypass_cache=True does nothing if cache_metadata=False. But if cache_metadata=True, then modification_time will look in the cache if bypass_cache=False and will go to the remote server to get updated information if bypass_cache=True.
    • As before, when a cache has time_sensitive=True, retrieve() will check the mtime of the file and re-download it if the local copy is older than the remote copy. It does this using modification_time(), which will first check the metadata cache if enabled.
    • Note that if the remote file has been deleted, or for some reason the mtime can't be retrieved, then retrieve() will raise an exception even if there is a valid local copy of the file.
  • Added high-level tutorial in new "Introduction" doc section.
  • Added a wide variety of examples in new "Examples" doc section.
  • Exposed FCPath.filecache property. Fixes Add FCPath.filecache attribute #55.
  • Added new function FCPath.splitpath that splits a path at directories with a given name. Fixes Add a function to split an FCPath on a given directory name #44.
  • Made FCPath concatenation delete redundant / like in FCPath('/a/b//', 'c///d'). Fixes FCPath does not fully normalize paths #53.
  • Change set_easy_logger to use decimal points in the seconds field. Fixes Easy logger uses commas for microseconds #51.
  • Only support Python >= 3.10 now.

Known Problems

  • None

Summary by CodeRabbit

  • New Features

    • Added bypass_cache for fresh remote mtime checks; preserve_mtime to retain mtimes on download/upload; FCPath.splitpath() and FCPath.filecache property.
  • Documentation

    • New introduction and examples docs; updated docs navigation and module anchor.
  • Bug Fixes

    • Local cache removed on unlink even if remote unlink fails; normalized repeated slashes and improved URL/path splitting.
  • Chores

    • Minimum Python requirement raised to 3.10.
  • Tests

    • Updated tests for unlink, mtime, splitpath and path-normalization behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

Summary by Bito

  • This pull request introduces significant improvements to the metadata handling of file operations, particularly focusing on the preservation of modification times during file retrieval and upload processes.
  • It enhances the functionality of the FCPath class by introducing a new filecache property and modifying existing methods to ensure that the local and remote file modification times are synchronized based on the cache's sensitivity settings.
  • The pull request also introduces significant improvements to the file caching mechanism, particularly in how metadata caching and modification times are handled.
  • Additionally, the documentation has been expanded with new sections and examples to aid user understanding.
  • Overall, this pull request introduces improvements to the FileCache and FCPath classes, enhances metadata handling and time sensitivity, and expands documentation with new sections and examples.

@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

Walkthrough

Adds new documentation pages; normalizes duplicate slashes in URLs/paths; exposes FCPath.filecache and FCPath.splitpath; implements preserve_mtime propagation and bypass_cache handling across FileCache flows and sources; adjusts retrieve/upload/unlink/modification-time logic and updates tests and CI Python requirement.

Changes

Cohort / File(s) Summary
Documentation
docs/examples.rst, docs/introduction.rst, docs/index.rst, docs/module.rst
Add comprehensive introduction and examples, adjust toctree ordering, and add a Sphinx anchor label.
FileCache core
filecache/file_cache.py
Add bypass_cache to modification_time and propagate through multi-file flows; normalize duplicate slashes in _split_url; forward preserve_mtime/time-sensitive semantics to retrieve/upload; ensure local-cache removal in unlink flows; refine logging timestamp format.
FileCacheSource implementations
filecache/file_cache_source.py, filecache/... (concrete sources: FileCacheSourceFile, FileCacheSourceHTTP, FileCacheSourceGS, FileCacheSourceS3, FileCacheSourceFake)
Add preserve_mtime parameter to retrieve/upload and parallel variants; propagate through thread/parallel helpers; read/write mtimes from metadata or Last-Modified and apply os.utime where applicable; update metadata keys and modification_time checks.
FCPath API & path handling
filecache/file_cache_path.py
Replace internal _filecache_to_use references with public filecache property; add FCPath.filecache and FCPath.splitpath(search_dir); normalize consecutive slashes in joins/splits; add bypass_cache passthrough on modification_time; adjust split/root edge cases.
Tests — FileCache
tests/test_file_cache.py
Add test for multiple-slash normalization; update unlink tests to expect local cache removal on unlink failures; refactor mtime tests to use per-test unique IDs and add bypass_cache checks and cache-refresh steps.
Tests — FCPath
tests/test_file_cache_path.py
Update expected split/parts semantics (URL/GS bases include trailing slash); add normalization/join tests and splitpath coverage; tighten pattern-match expectations.
CI / Packaging
.github/workflows/run-tests.yml, pyproject.toml
Remove Python 3.9 from test matrix and bump requires-python to >=3.10 (remove 3.9 classifier).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.12% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Doc improvements, better mtime handling' directly and clearly describes the main changes: enhanced documentation (introduction and examples sections) and improved modification time handling across the codebase.
Description check ✅ Passed The pull request description follows the template structure with Fixed Issues section listing five issues (#44, #51, #53, #55, #57), a detailed Summary of Changes explaining each fix with technical depth, and a Known Problems section confirming none exist.
Linked Issues check ✅ Passed All five linked issues are addressed: #44 adds FCPath.splitpath method, #51 changes logger timestamp formatting to use periods, #53 normalizes FCPath path slashes, #55 exposes FCPath.filecache property, and #57 implements comprehensive modification time handling with bypass_cache parameter and time-sensitive semantics.
Out of Scope Changes check ✅ Passed Changes are limited to addressing the five linked issues. Python version minimum increase to 3.10 and test matrix updates are reasonable project-wide maintenance aligned with the PR's scope. No unrelated modifications detected.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Fix all issues with AI Agents 🤖
In @docs/examples.rst:
- Line 113: Remove the stray extra single-quote at the end of the f-string in
the print statement in docs/examples.rst so the string is properly closed and
the statement parses (change the print line that currently ends with two
consecutive single quotes to end with a single closing quote).

In @docs/introduction.rst:
- Around line 541-546: The "Best Practices" section contains a trailing empty
bullet ("-") that renders as a blank list item; remove the extraneous "-" or
replace it with the intended text so the list under the "Best Practices" heading
is complete and no empty bullet remains.
- Around line 63-67: Fix the reST role typo in the sentence describing
cache_root: change the malformed role ``:class:`FileCache`` instance`` to the
correct `:class:\`FileCache\` instance` so Sphinx recognizes the FileCache class
role properly; update the documentation line that references FileCache and
cache_root to remove the extra backtick and ensure the role is wrapped as
`:class:\`FileCache\``.
- Around line 355-376: Update the FileCache.modification_time docs to mention
metadata caching and bypass behavior: note that when cache_metadata=True
modification_time may return values from the metadata cache, and that passing
bypass_cache=True forces a fresh remote lookup (matching the tests and public
API). Reference the FileCache.modification_time method and the bypass_cache and
cache_metadata parameters in the short explanatory note so readers know cached
metadata can be used and how to override it.
- Around line 140-159: The documentation still claims upload may update the
local file mtime when the remote mtime cannot be set; update the “Preserving
File Timestamps” bullets to state that FileCache.upload never modifies the local
file’s mtime (referencing test_modification_time_caching), that when
time_sensitive=True the remote mtime is set from the local mtime and when
time_sensitive=False the remote mtime is set to the upload time, and clarify
that subsequent download behavior is governed by time_sensitive and
cache_metadata (and that methods like FileCache.retrieve,
FileCache.isdir/iterdir/iterdir_metadata populate/consult the metadata cache).

In @filecache/file_cache_source.py:
- Around line 1233-1245: The RuntimeError in FileCacheSourceGS.retrieve is using
plain string literals so "{self._src_prefix_}{sub_path}" is not interpolated;
change the raise to include the actual values (e.g., use an f-string or .format)
so the message references self._src_prefix_ and sub_path, e.g. raise
RuntimeError(f'Failed to retrieve modification time from Google Storage bucket
for {self._src_prefix_}{sub_path}'); update the raise in the block that computes
mtime_sec inside FileCacheSourceGS.retrieve accordingly.

In @tests/test_file_cache_path.py:
- Around line 1300-1311: Add two extra assertions to test_splitpath to cover
edge cases: one where the search_dir is the first non-anchor component (e.g.,
call FCPath('/c/b/c').splitpath('c') and assert the expected tuple) and one with
adjacent occurrences to ensure no empty segments (e.g., call
FCPath('/a/c/c/d').splitpath('c') and assert expected segments). Locate the
additions in the same test function (test_splitpath) and use FCPath.splitpath
for both local and optionally gs:// path variants to mirror existing tests.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
tests/test_file_cache.py (2)

2169-2197: Clean up unused loop variable p in cloud unlink multi test

The new expectations around test_cloud_unlink_multi (ensuring all local_paths are gone after each unlink attempt) look correct and match the updated unlink semantics. However, in the reinitialization loops you no longer use p:

for lp, p in zip(local_paths, paths):
    _copy_file(..., lp)

Ruff correctly flags p as unused (B007). These can be simplified to just iterate over local_paths (same in both re-copy blocks) without changing behavior.

Proposed loop simplification
-        for lp, p in zip(local_paths, paths):
-            _copy_file(EXPECTED_DIR / EXPECTED_FILENAMES[0], lp)
+        for lp in local_paths:
+            _copy_file(EXPECTED_DIR / EXPECTED_FILENAMES[0], lp)
...
-        for lp, p in zip(local_paths, paths):
-            _copy_file(EXPECTED_DIR / EXPECTED_FILENAMES[0], lp)
+        for lp in local_paths:
+            _copy_file(EXPECTED_DIR / EXPECTED_FILENAMES[0], lp)

2267-2294: Mirror loop simplification in prefix-based cloud unlink multi test

The new checks in test_cloud_unlink_multi_pfx (local files always removed after unlink, regardless of exception_on_fail/missing_ok) are consistent with the FileCache.unlink behavior. As above, the helper loops used to re-copy local files don’t use p:

for lp, p in zip(local_paths, paths):
    _copy_file(..., lp)

You can drop p from these loops to avoid the B007 warning and slightly simplify the tests.

Proposed loop simplification
-        for lp, p in zip(local_paths, paths):
-            _copy_file(EXPECTED_DIR / EXPECTED_FILENAMES[0], lp)
+        for lp in local_paths:
+            _copy_file(EXPECTED_DIR / EXPECTED_FILENAMES[0], lp)
...
-        for lp, p in zip(local_paths, paths):
-            _copy_file(EXPECTED_DIR / EXPECTED_FILENAMES[0], lp)
+        for lp in local_paths:
+            _copy_file(EXPECTED_DIR / EXPECTED_FILENAMES[0], lp)
filecache/file_cache_path.py (1)

658-667: filecache property cleanly exposes the associated FileCache

The new filecache property (lazy-creating a shared default when _filecache is None) gives FCPath users a straightforward way to access the underlying FileCache instance and replaces ad‑hoc internal helpers. This aligns with the PR goal of exposing FCPath.filecache and centralizes default-FileCache handling.

Minor style note: with from __future__ import annotations in use, the return annotation can be -> FileCache without quotes, but this is purely cosmetic.

filecache/file_cache.py (4)

829-905: retrieve() still raises when remote mtime lookup fails despite cached local file

With _time_sensitive=True, the single- and multi-file retrieve paths call modification_time using the default exception_on_fail=True:

  • Single: self.modification_time(url) in _retrieve_single (Line 1331).
  • Multi: self.modification_time(urls) in _retrieve_multi_unlocked and _retrieve_multi_locked (Lines 1398, 1517).

If the remote mtime cannot be obtained (network error, 404, transient backend failure), modification_time will raise, even when a local cached file exists. This contradicts the stated behavior in the PR description that retrieve() should not raise in this situation; instead it should fall back to the cached file and only refresh when a valid remote mtime indicates the remote is newer.

A concrete failure mode:

  • First run: file is downloaded and cached.
  • Later run with _time_sensitive=True: remote file has been deleted or mtime fetch fails; modification_time() raises FileNotFoundError/backend error and retrieve() fails instead of returning the cached copy.

You can fix this by calling modification_time with exception_on_fail=False in the time-sensitive paths and treating Exceptions as “no remote mtime available”:

Suggested fix: make retrieval tolerant of mtime failures
@@ def _retrieve_single(self,
-            if local_path.is_file():
-                source_time: float | None = None
-                if self._time_sensitive:
-                    source_time = cast(float | None, self.modification_time(url))
-                    if source_time is None:
-                        self._log_debug(f'No modification time available for {url} '
-                                        'even though a local copy exists')
+            if local_path.is_file():
+                source_time: float | None = None
+                if self._time_sensitive:
+                    mt = self.modification_time(
+                        url,
+                        exception_on_fail=False,
+                    )
+                    if isinstance(mt, Exception):
+                        self._log_debug(
+                            f'Unable to retrieve modification time for {url}: {mt!r}'
+                        )
+                    else:
+                        source_time = cast(float | None, mt)
+                        if source_time is None:
+                            self._log_debug(
+                                f'No modification time available for {url} '
+                                'even though a local copy exists'
+                            )
@@ def _retrieve_multi_unlocked(self,
-        if self._time_sensitive:
-            urls: list[str | Path] = [f'{source._src_prefix_}{sub_path}'
-                                      for source, sub_path in zip(sources, sub_paths)]
-            source_times = cast(list[Union[float, None]], self.modification_time(urls))
-        else:
-            source_times = [None] * len(sources)
+        if self._time_sensitive:
+            urls: list[str | Path] = [
+                f'{source._src_prefix_}{sub_path}'
+                for source, sub_path in zip(sources, sub_paths)
+            ]
+            mtimes = cast(
+                list[float | None | Exception],
+                self.modification_time(urls, exception_on_fail=False),
+            )
+            # Treat mtime lookup failures as "no remote mtime available".
+            source_times: list[float | None] = [
+                mt if isinstance(mt, (float, type(None))) else None for mt in mtimes
+            ]
+        else:
+            source_times = [None] * len(sources)
@@ def _retrieve_multi_locked(self,
-        if self._time_sensitive:
-            urls: list[str | Path] = [f'{source._src_prefix_}{sub_path}'
-                                      for source, sub_path in zip(sources, sub_paths)]
-            source_times = cast(list[Union[float, None]], self.modification_time(urls))
-        else:
-            source_times = [None] * len(sources)
+        if self._time_sensitive:
+            urls: list[str | Path] = [
+                f'{source._src_prefix_}{sub_path}'
+                for source, sub_path in zip(sources, sub_paths)
+            ]
+            mtimes = cast(
+                list[float | None | Exception],
+                self.modification_time(urls, exception_on_fail=False),
+            )
+            source_times: list[float | None] = [
+                mt if isinstance(mt, (float, type(None))) else None for mt in mtimes
+            ]
+        else:
+            source_times = [None] * len(sources)

This keeps the “refresh when remote is newer” logic but ensures that failures to read the remote mtime result in continued use of the cached file instead of exceptions.

Also applies to: 933-939, 1327-1363, 1396-1402, 1515-1519


1778-1782: upload() docstring is inconsistent with new mtime behavior

The upload() notes still say:

the modification time of the local file is set to the modification time of the remote file after the upload is complete.

But the implementation now:

  • Passes preserve_mtime=self._time_sensitive into source.upload / upload_multi, and
  • Never updates the local file’s mtime.

Backend implementations (GS/S3/Fake) set remote mtime from the local file when preserve_mtime=True and leave the local file untouched.

To avoid confusion for callers relying on time semantics, the docstring should be updated to describe the actual behavior, e.g.:

Suggested docstring update
-        Notes:
-            If `time_sensitive` is True for this :class:`FileCache` instance, then the
-            modification time of the local file is set to the modification time of the
-            remote file after the upload is complete. If `time_sensitive` is False, then
-            the modification time of the local file is not changed.
+        Notes:
+            If `time_sensitive` is True for this :class:`FileCache` instance, then the
+            modification time of the remote file is set to the modification time of the
+            local file. If `time_sensitive` is False, then the modification time of the
+            remote file is set by the backend at upload time, and the local file's
+            modification time is not changed.

Also applies to: 1812-1823, 1849-1880


2210-2212: Unlink docs no longer match behavior; _unlink_multi also mis-pairs URLs when logging failures

Two related points:

  1. Local cache removal semantics changed but docstring wasn’t updated

The unlink() notes say:

the locally cached version (if any) is only removed if the unlink of the remote location succeeded.

But the code now always unlinks the local cached file in a finally block:

finally:
    local_path.unlink(missing_ok=True)

and _unlink_multi similarly calls local_path.unlink(missing_ok=True) regardless of whether the remote unlink succeeded.

This new behavior (which is reasonable and matches the inline comment) should be reflected in the docstring to avoid misleading users.

  1. _unlink_multi logs and aggregates wrong URLs for failed unlinks

In _unlink_multi, for each prefix group you do:

rets = source.unlink_multi(source_sub_paths, missing_ok=missing_ok, nthreads=nthreads)
...
for ret2, local_path, url in zip(rets, source_local_paths, urls):
    if isinstance(ret2, Exception):
        self._log_debug(f'    Remote unlink failed: {url} {ret2}')
        files_not_exist.append(str(url))
    local_path.unlink(missing_ok=True)

Here, urls is the full list of URLs for all sources, but rets and source_local_paths are only for the current prefix. This means url is not guaranteed to correspond to the local_path/ret2 for that iteration, and files_not_exist may collect incorrect entries. The returned func_ret list uses source_idxes correctly, so the bug only affects logging and the error message content.

You can fix both the documentation and the mis-pairing like this:

Suggested docstring and `_unlink_multi` fixes
@@ def unlink(self,
-        Notes:
-            If a URL points to a remote location, the locally cached version (if any) is
-            only removed if the unlink of the remote location succeeded.
+        Notes:
+            If a URL points to a remote location, the locally cached version (if any) is
+            always removed, even if the unlink of the remote location fails. This avoids
+            stale cache entries when the remote unlink actually succeeded but reported an
+            error (for example, due to a transient connection issue).

@@ def _unlink_multi(self,
-            rets = source.unlink_multi(source_sub_paths, missing_ok=missing_ok,
-                                       nthreads=nthreads)
+            rets = source.unlink_multi(
+                source_sub_paths,
+                missing_ok=missing_ok,
+                nthreads=nthreads,
+            )
             assert len(source_idxes) == len(rets)
-            for ret2, local_path, url in zip(rets, source_local_paths, urls):
-                if isinstance(ret2, Exception):
-                    self._log_debug(f'    Remote unlink failed: {url} {ret2}')
-                    files_not_exist.append(str(url))
-                local_path.unlink(missing_ok=True)  # Remove from cache
+            for ret2, local_path, source_idx in zip(
+                rets, source_local_paths, source_idxes
+            ):
+                url = urls[source_idx]
+                if isinstance(ret2, Exception):
+                    self._log_debug(f'    Remote unlink failed: {url} {ret2}')
+                    files_not_exist.append(str(url))
+                # Always remove from cache regardless of remote result
+                local_path.unlink(missing_ok=True)

Also applies to: 2250-2257, 2307-2318


441-451: UNC path handling in _split_url collapses leading slashes

The code normalizes backslashes and removes consecutive slashes:

url = url.replace("\\", "/")
...
while '//' in url:
    url = url.replace('//', '/')

This collapses UNC-style paths like \\host\share\path (converted to //host/share/path) into /host/share/path, changing their meaning on Windows where UNC paths are network references.

If UNC paths are not intended to be supported, explicitly reject them with a ValueError. If they should be supported, either skip the while '//' in url cleanup for paths starting with //, or defer the cleanup until after the protocol/authority portion is parsed.

filecache/file_cache_source.py (1)

564-602: Local-file backend signatures are consistent; unused preserve_mtime is acceptable

FileCacheSourceFile.retrieve / upload now accept preserve_mtime but deliberately ignore it, which is fine because local paths are already “authoritative” and no remote metadata exists. If you want to silence Ruff’s ARG002, you could rename the parameter to _preserve_mtime or add a dummy _=preserve_mtime assignment, but behavior-wise this is correct.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8dae7f0 and b5d2299.

📒 Files selected for processing (9)
  • docs/examples.rst
  • docs/index.rst
  • docs/introduction.rst
  • docs/module.rst
  • filecache/file_cache.py
  • filecache/file_cache_path.py
  • filecache/file_cache_source.py
  • tests/test_file_cache.py
  • tests/test_file_cache_path.py
🧰 Additional context used
🧬 Code graph analysis (4)
tests/test_file_cache_path.py (1)
filecache/file_cache_path.py (5)
  • FCPath (36-2063)
  • _split (232-245)
  • parts (467-473)
  • match (614-640)
  • splitpath (524-539)
filecache/file_cache_path.py (1)
filecache/file_cache.py (11)
  • get_local_path (565-654)
  • url_to_url (400-403)
  • url_to_path (406-409)
  • exists (656-774)
  • nthreads (394-397)
  • modification_time (829-931)
  • is_dir (1002-1098)
  • retrieve (1163-1304)
  • upload (1697-1823)
  • iterdir (2019-2068)
  • iterdir_metadata (2070-2129)
tests/test_file_cache.py (2)
filecache/file_cache_path.py (2)
  • exists (764-845)
  • unlink (1290-1376)
filecache/file_cache.py (2)
  • exists (656-774)
  • unlink (2131-2259)
filecache/file_cache.py (2)
filecache/file_cache_path.py (4)
  • is_file (1395-1398)
  • modification_time (847-918)
  • stat (1699-1711)
  • retrieve (982-1103)
filecache/file_cache_source.py (12)
  • modification_time (144-146)
  • modification_time (548-552)
  • modification_time (755-791)
  • modification_time (1109-1133)
  • modification_time (1436-1464)
  • modification_time (1803-1807)
  • retrieve (251-255)
  • retrieve (564-596)
  • retrieve (824-889)
  • retrieve (1177-1247)
  • retrieve (1515-1572)
  • retrieve (1819-1862)
🪛 Ruff (0.14.10)
filecache/file_cache_path.py

537-537: Consider [-1, *indices, len(parts)] instead of concatenation

Replace with [-1, *indices, len(parts)]

(RUF005)


658-658: Remove quotes from type annotation

Remove quotes

(UP037)

tests/test_file_cache.py

2180-2180: Loop control variable p not used within loop body

(B007)


2188-2188: Loop control variable p not used within loop body

(B007)


2277-2277: Loop control variable p not used within loop body

(B007)


2285-2285: Loop control variable p not used within loop body

(B007)

filecache/file_cache.py

937-937: Boolean-typed positional argument in function definition

(FBT001)


938-938: Boolean-typed positional argument in function definition

(FBT001)

filecache/file_cache_source.py

254-254: Boolean-typed positional argument in function definition

(FBT001)


254-254: Boolean default positional argument in function definition

(FBT002)


303-303: Boolean-typed positional argument in function definition

(FBT001)


310-310: Boolean-typed positional argument in function definition

(FBT001)


329-329: Boolean-typed positional argument in function definition

(FBT001)


329-329: Boolean default positional argument in function definition

(FBT002)


370-370: Boolean-typed positional argument in function definition

(FBT001)


377-377: Boolean-typed positional argument in function definition

(FBT001)


567-567: Boolean-typed positional argument in function definition

(FBT001)


567-567: Boolean default positional argument in function definition

(FBT002)


567-567: Unused method argument: preserve_mtime

(ARG002)


601-601: Boolean-typed positional argument in function definition

(FBT001)


601-601: Boolean default positional argument in function definition

(FBT002)


601-601: Unused method argument: preserve_mtime

(ARG002)


827-827: Boolean-typed positional argument in function definition

(FBT001)


827-827: Boolean default positional argument in function definition

(FBT002)


894-894: Boolean-typed positional argument in function definition

(FBT001)


894-894: Boolean default positional argument in function definition

(FBT002)


1180-1180: Boolean-typed positional argument in function definition

(FBT001)


1180-1180: Boolean default positional argument in function definition

(FBT002)


1243-1244: Avoid specifying long messages outside the exception class

(TRY003)


1252-1252: Boolean-typed positional argument in function definition

(FBT001)


1252-1252: Boolean default positional argument in function definition

(FBT002)


1454-1454: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


1518-1518: Boolean-typed positional argument in function definition

(FBT001)


1518-1518: Boolean default positional argument in function definition

(FBT002)


1577-1577: Boolean-typed positional argument in function definition

(FBT001)


1577-1577: Boolean default positional argument in function definition

(FBT002)


1597-1597: Avoid specifying long messages outside the exception class

(TRY003)


1822-1822: Boolean-typed positional argument in function definition

(FBT001)


1822-1822: Boolean default positional argument in function definition

(FBT002)


1867-1867: Boolean-typed positional argument in function definition

(FBT001)


1867-1867: Boolean default positional argument in function definition

(FBT002)

🔇 Additional comments (22)
docs/index.rst (1)

17-18: ✓ Navigation structure looks good.

The new toctree entries are properly formatted and follow a logical flow (introduction → examples → module).

docs/module.rst (1)

1-2: ✓ Sphinx anchor placement is correct.

The cross-reference label is properly positioned before the section heading, enabling other documentation to link to the module reference.

docs/examples.rst (1)

1-592: Comprehensive examples documentation successfully integrated.

The 21 examples provide clear, progressive coverage of FileCache and FCPath functionality, including time-sensitive caching, metadata operations, parallel downloads, URL/path translations, and directory traversal. All code blocks follow proper RST formatting and are syntactically correct (except line 113). The examples reinforce the new filecache property feature (#55) and support the feature set described in the PR objectives.

Note: The splitpath feature (#44) is not covered in examples; if it is a user-facing feature you'd like demonstrated, consider adding an example showing path splitting behavior.

tests/test_file_cache.py (2)

2139-2152: Cloud unlink test now correctly asserts local cache cleanup

The added assert not local_path.exists() after a failing unlink(..., missing_ok=False) matches the new behavior where FileCache.unlink always removes the local cached copy in a finally block, regardless of remote outcome. The subsequent checks (graceful handling with exception_on_fail=False and missing_ok=True) are consistent and exercise both error and non-error paths.


2790-2852: Modification-time caching test accurately captures new semantics

test_modification_time_caching now exercises:

  • Initial upload behavior: when time_sensitive is True, remote mtime matches the local file’s mtime; when False, it differs, and local mtime is never modified by upload().
  • Metadata caching: with cache_metadata=True, a plain modification_time() call continues to return the original value even after a concurrent writer updates the remote, while bypass_cache=True sees the new remote mtime.
  • Retrieve behavior: retrieve() only refreshes the local copy (content and mtime) when time_sensitive is True; otherwise the local file is left unchanged even if the remote has changed.
  • Fresh download after local deletion: for time_sensitive=True, local mtime equals the remote mtime; for False, it reflects the download time (st_mtime > mtime_new).

This matches the described mtime and cache semantics and should catch regressions in both the caching layer and the time-sensitive paths.

filecache/file_cache_path.py (10)

231-246: Path splitting/root handling looks correct and preserves existing semantics

The tweaks in _split() for the root == '/' and subpath == root case and the idx == 0 branch still return (drive + '/', ''), which matches the previous behavior for paths like '/' and 'gs://bucket/'. This keeps name/parent logic consistent for root paths while making the special-case more explicit.


270-282: Double-slash normalization in _join matches FCPath normalization goals

The new loop that collapses // sequences in subpath ensures that FCPath concatenation behaves more like pathlib.Path, normalizing redundant slashes (per issue #53). Because this operates only on subpath (after _split_parts has stripped schemes and UNC prefixes into drive/root), it won’t corrupt constructs like gs:// or UNC //host/share.

Behavior is therefore improved for local and remote paths without breaking scheme handling.


524-539: Confirm splitpath semantics vs requested API

splitpath() returns FCPath segments between occurrences of search_dir, explicitly excluding search_dir itself, and returns a single-element tuple (self,) when search_dir is not present. For multiple matches, you get multiple segments (e.g., 'a/s/b/s/c'('a', 'b', 'c') when searching for 's').

This is consistent with the docstring but differs from a more “prefix/suffix (including the split directory)” interpretation some callers might expect. In particular:

  • The search_dir element itself is dropped from all segments.
  • When search_dir appears in the middle of a URL-like path, later segments become relative paths, not full URLs.

Please double-check that this behavior matches what you want for issue #44’s “split an FCPath on a given directory name” use cases and the tests in tests/test_file_cache_path.py.


753-763: Dispatching get_local_path via self.filecache is appropriate

Switching get_local_path() to call self.filecache.get_local_path(...) (with the FCPath’s stored anonymous, url_to_url, and url_to_path overrides) maintains the existing behavior while routing through the new filecache property. Combined with _make_paths_absolute, it continues to honor relative and absolute subpaths correctly.


833-846: exists correctly wires through bypass_cache and FCPath overrides

The updated exists() method:

  • Normalizes nthreads via _validate_nthreads.
  • Uses _make_paths_absolute for subpaths.
  • Forwards bypass_cache, nthreads, anonymous, and the effective URL translators to self.filecache.exists().

This matches the FileCache API (including bypass_cache) and preserves FCPath-level overrides.


847-918: modification_time now exposes bypass_cache and multi-path semantics cleanly

The FCPath modification_time() signature and implementation now mirror the FileCache API by:

  • Adding the bypass_cache keyword.
  • Supporting single and multi-path calls via _make_paths_absolute.
  • Passing through anonymous, nthreads, exception_on_fail, and url_to_url.

This aligns FCPath behavior with the underlying FileCache.modification_time (including metadata cache vs bypass_cache=True) and matches the new tests.


1078-1102: Download counter accounting for retrieve is correct

Tracking old_download_counter = self.filecache.download_counter before calling self.filecache.retrieve(...), then incrementing _download_counter by the delta in a finally block, correctly attributes only the downloads initiated through this FCPath instance—even when retrieve raises or uses multi-file operations.

This is a reasonable way to keep per-FCPath counters aligned with the shared FileCache counters.


1182-1201: Upload counter accounting for upload mirrors the retrieve logic correctly

The analogous old_upload_counter/delta logic in upload() ensures _upload_counter captures uploads initiated via this FCPath, regardless of success/failure and including multi-file uploads, while delegating actual behavior to self.filecache.upload(...).

The symmetry with retrieve() is good for maintainability.


1448-1471: Iterdir wrappers now correctly use the shared FileCache and preserve metadata caching

Both iterdir() and iterdir_metadata() now delegate directly to self.filecache.iterdir[_metadata](self._path, url_to_url=...), and re-wrap returned paths as FCPath instances via copy_from=self. Given the FileCache implementation populates _metadata_cache_isdir and _metadata_cache_mtime as it iterates, this design keeps FCPath iteration consistent with cache behavior while exposing a path-oriented interface.


1707-1711: Improved stat error message for remote paths

The updated stat() implementation now explicitly tells callers to use modification_time() when they just need the mtime for remote files:

raise NotImplementedError(
    'stat on a remote file is not implemented; use modification_time() if you just want that info'
)

This is a clearer guidance than the previous generic message and aligns with how remote metadata is exposed elsewhere.

tests/test_file_cache_path.py (4)

88-96: HTTP _split expectations match updated path semantics

The new _split cases for http://domain.name correctly treat the authority plus trailing slash as the “head” and the remainder as the tail, which is consistent with the updated _split implementation and with how other schemes are tested. No issues here.


395-403: Updated .parts expectations for anchored paths look correct

Expecting the anchor component ('/', 'C:/', 'gs://bucket/') to appear as the first element in parts is consistent with the new FCPath.parts implementation and mirrors pathlib.Path.parts behavior. The tests exercise local, Windows, and GS-style URIs adequately.


443-449: Redundant slash normalization via / operator is well-covered

The added __truediv__ tests confirm that mixed redundant slashes across components collapse to a single separator ('/c/d/e/f'). This aligns with the new _join normalization and should prevent surprising path strings from chained divisions.


529-561: Stricter match behavior for patterns with leading / is appropriate

The new expectations that patterns like '/*/def' and '/abc/def' (with a leading /) do not match relative paths are consistent with the updated FCPath.match logic and closer to pathlib.Path.match semantics. The tests cover both case-sensitive and case-variant paths.

filecache/file_cache.py (1)

87-91: Easy logger now uses . separator with millisecond precision

The new formatter string writes timestamps as YYYY-MM-DD HH:MM:SS.mmm (period before milliseconds). This addresses the “comma vs period” issue and is a conventional logging format. No functional concerns.

filecache/file_cache_source.py (2)

250-305: Preserve-mtime plumbing through FileCacheSource and multi-helpers looks correct

The abstract retrieve / upload signatures and the _download_object(_parallel) / _upload_object(_parallel) helpers now cleanly propagate preserve_mtime through to concrete implementations, and retrieve_multi / upload_multi expose it as a keyword-only option. This matches the higher-level FileCache usage and keeps API symmetry. No functional concerns here.

Also applies to: 332-383


824-888: Backend preserve_mtime implementations are coherent across HTTP, GS, S3, and Fake

The concrete backends now implement preserve_mtime as expected:

  • HTTP: when preserve_mtime=True, parses Last-Modified and applies it to the local file via os.utime. When absent, it quietly falls back to the filesystem mtime set at download time.
  • Google Storage:
    • modification_time prefers the reserved 'goog-reserved-file-mtime' metadata, falling back to blob.updated.timestamp().
    • retrieve with preserve_mtime=True reloads metadata, derives an mtime (metadata first, then updated), and applies it to the local file.
    • upload with preserve_mtime=True writes 'goog-reserved-file-mtime' into blob.metadata before upload.
  • S3:
    • modification_time prefers custom Metadata['mtime'] when present, falling back to LastModified.
    • retrieve with preserve_mtime=True calls modification_time(sub_path) and utimes the local file if a timestamp is available.
    • upload with preserve_mtime=True sets ExtraArgs['Metadata']['mtime'] based on the local file’s mtime.
  • Fake:
    • retrieve and upload both use shutil.copy2 (which already preserves mtime) and then optionally call os.utime when preserve_mtime=True. This is redundant but harmless and keeps the interface consistent.

Overall, this matches the intended semantics from FileCache (remote mtime mirrors local when time-sensitive; otherwise it defaults to backend upload time). Aside from some small doc-level nuances in FileCache, the backend behavior looks solid.

Also applies to: 1178-1190, 1233-1245, 1515-1571, 1820-1861, 1865-1905

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Fix all issues with AI Agents 🤖
In @docs/introduction.rst:
- Around line 462-463: Add a short sentence in the FCPath constructor docs to
document the FCPath.filecache property: mention that the associated FileCache
instance can also be accessed via the FCPath.filecache attribute (referencing
:class:`FCPath` and :class:`FileCache`/ :attr:`FCPath.filecache`) to improve
discoverability and link it to the constructor's filecache keyword argument.
- Around line 445-536: The introduction omits the new FCPath.splitpath method;
add a short subsection under the FCPath documentation that documents
FCPath.splitpath (include its signature and behavior: splitting an FCPath at a
specified directory name and returning the resulting head and tail), show a
minimal example using FCPath.splitpath to split a URL-like path (e.g., create an
FCPath, call splitpath with a directory name and show the returned head and
tail), and mention how it respects the FCPath's FileCache association; place
this text near the existing FCPath examples so users can find it alongside the
constructor and / operator examples.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
filecache/file_cache.py (1)

1-50: Critical: Python 3.9 compatibility issue—widespread use of PEP604 union syntax.

The codebase uses PEP604 union syntax (float | None, str | Path, etc.) extensively across multiple files, but the project declares requires-python = ">=3.9" in pyproject.toml. PEP604 syntax was introduced in Python 3.10 and is incompatible with Python 3.9.

Line 1331 in filecache/file_cache.py contains: cast(float | None, ...), along with dozens of other instances in this file and file_cache_source.py and file_cache_path.py.

Convert all PEP604 union syntax to Union[Type1, Type2] or Optional[Type] from the typing module to maintain Python 3.9 compatibility, or update requires-python to >=3.10.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5d2299 and 2589e65.

📒 Files selected for processing (3)
  • docs/introduction.rst
  • filecache/file_cache.py
  • tests/test_file_cache_path.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_file_cache_path.py (1)
filecache/file_cache_path.py (4)
  • FCPath (36-2063)
  • _split (232-245)
  • parts (467-473)
  • splitpath (524-539)
🪛 GitHub Actions: Run Tests: branch 58/merge by rfrenchseti
filecache/file_cache.py

[error] 1331-1331: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' caused by using PEP604 union 'float | None' in Python 3.9.13. Replace with Optional[float] (from typing) to maintain compatibility.

🪛 Ruff (0.14.10)
filecache/file_cache.py

937-937: Boolean-typed positional argument in function definition

(FBT001)


938-938: Boolean-typed positional argument in function definition

(FBT001)

🔇 Additional comments (13)
tests/test_file_cache_path.py (5)

93-95: LGTM! URL base normalization is consistent.

The updated assertions correctly reflect the new behavior where URL bases include a trailing slash (e.g., 'http://domain.name/'). This aligns with the path normalization improvements mentioned in the PR objectives and is consistent with the updated parts() behavior.


399-402: LGTM! Anchor representation is now consistent.

The updated test expectations correctly reflect that path anchors now include trailing slashes (e.g., '/', 'C:/', 'gs://bucket/'). This normalization makes the anchor representation more explicit and uniform across different path types.


447-448: LGTM! Slash normalization tests confirm issue #53 fix.

These new test cases verify that FCPath concatenation correctly collapses redundant slashes (//, ///) into single slashes, addressing issue #53. The tests cover both trailing slashes in path components and multiple consecutive slashes.


542-543: LGTM! Corrected match() expectations for relative vs. absolute paths.

The updated assertions correctly reflect that relative paths (e.g., 'abc/def') should not match absolute patterns (e.g., '/*/def', '/abc/def'). This fixes previously incorrect test expectations to align with proper path matching semantics.

Also applies to: 558-558, 560-560


1298-1310: LGTM! Basic splitpath() functionality is well-tested.

The new test_splitpath() function correctly validates the core behavior of FCPath.splitpath():

  • Returns original path as single-element tuple when search_dir not found
  • Splits at each occurrence of search_dir, excluding the delimiter
  • Preserves anchors correctly for both local and GS paths

As noted in the past review comment, consider adding edge case tests for:

  • search_dir as the first non-anchor component (e.g., FCPath('/c/b/c').splitpath('c'))
  • Adjacent occurrences to ensure no empty segments (e.g., FCPath('/a/c/c/d').splitpath('c'))
filecache/file_cache.py (8)

88-90: LGTM! Timestamp formatting fix addresses issue #51.

The updated formatter correctly uses a decimal point (.) before milliseconds instead of a comma, and explicitly sets the date format. This provides clearer, more standard timestamp formatting in log output.


449-450: LGTM! URL path normalization prevents malformed paths.

The addition of double-slash cleanup in _split_url() defensively handles badly formed paths that may result from concatenation. This normalization applies to both local file paths and remote URL sub-paths, addressing issue #53 at the URL parsing level.

Also applies to: 459-460


832-832: LGTM! bypass_cache parameter provides fine-grained cache control.

The new bypass_cache parameter in modification_time() allows callers to force a remote lookup even when metadata caching is enabled. The implementation is consistent across both single and multi-file operations:

  • bypass_cache=False (default): Uses cached metadata if available
  • bypass_cache=True: Always queries the remote source

This addresses the PR objective of allowing metadata cache bypass when needed for time-sensitive operations.

Also applies to: 845-848, 904-904, 914-914, 937-938, 953-953


1056-1057: LGTM! Improved docstring clarifies is_dir() vs exists() behavior.

The updated documentation clearly contrasts is_dir() (raises exception if URL doesn't exist) with exists() (returns False if inaccessible). This helps prevent confusion about the different return semantics of these methods.


1327-1363: LGTM! Time-sensitive retrieval logic correctly handles mtime comparisons.

The reworked _retrieve_single() logic properly implements time-sensitive file caching:

  • Checks existing local files against remote modification time (with bypass_cache=True for fresh data)
  • Only re-downloads when remote is newer than local
  • Gracefully handles cases where remote mtime is unavailable
  • Preserves mtime when time_sensitive=True via preserve_mtime parameter

This addresses the PR's mtime handling improvements and avoids unnecessary downloads.


1400-1401: LGTM! Efficient mtime handling avoids redundant server queries.

The multi-file retrieval logic efficiently handles modification times by:

  1. Fetching all mtimes once with bypass_cache=True
  2. Passing preserve_mtime=False to retrieve_multi() to avoid duplicate server queries
  3. Manually updating local file mtimes afterwards using the already-fetched times

The inline comments clearly explain this optimization strategy, which minimizes server round trips while maintaining time-sensitive correctness.

Also applies to: 1450-1456, 1519-1520, 1600-1606


1816-1816: LGTM! Upload operations consistently propagate time-sensitive flag.

The preserve_mtime=self._time_sensitive parameter is correctly passed to all upload operations (single and multi-file). This ensures that when time_sensitive=True, local file mtimes are synchronized with remote mtimes after upload, as specified in the PR objectives.

Also applies to: 1852-1853, 1882-1882


2253-2260: LGTM! Defensive unlink behavior prevents stale cache entries.

The updated unlink logic now always removes the local cached copy in a finally block, even if the remote unlink fails. This prevents cache inconsistencies in scenarios where:

  • Remote unlink succeeds but reports an error (e.g., connection timeout)
  • Local cache would otherwise point to a non-existent remote file

The comment at line 2254 clearly explains this defensive strategy. While the user still receives any exception from the remote unlink, the cache remains consistent.

Also applies to: 2318-2320

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Fix all issues with AI Agents 🤖
In @docs/examples.rst:
- Around line 221-237: The url_to_path example's conditional never matches
because it checks remote == "bucket" while the URLs use "my-bucket"; update the
url_to_path function (the remote check) to match the bucket name used in the
examples (e.g., remote == "my-bucket" or use a consistent placeholder like
"{bucket}" and adjust the URL examples), and update the inline comments about
resulting cache paths to reflect FileCache's actual output (e.g., using
gs_my-bucket if the bucket is my-bucket) so the example and comments are
consistent; ensure references to url_to_path, FileCache, and retrieve remain
accurate.
- Around line 300-307: The example calls FileCache.new_path(time_sensitive=True)
which is invalid because FileCache.new_path does not accept time_sensitive;
update the example to set time sensitivity on the FileCache instance (e.g.,
create the cache with FileCache(time_sensitive=True) before calling new_path) or
construct an FCPath directly with FCPath(...) passing the per-path options;
change the snippet to either set FileCache(time_sensitive=True) and then call
fc.new_path('gs://...') or replace the new_path call with FCPath(fc,
'gs://my-bucket/data', time_sensitive=True) so it no longer passes an unexpected
keyword to FileCache.new_path.
- Around line 89-114: The example claims that download errors are returned in
the result list but calls FileCache.retrieve with the default behavior; update
the example to call FileCache.retrieve(urls, exception_on_fail=False) so errors
are returned as Exception objects instead of being raised, and ensure the
accompanying text mentions exception_on_fail=False (or alternatively change the
prose to state that retrieve raises on failure if you prefer raising behavior).
- Around line 172-178: The url_to_url mapping builds dir names from the entire
matched number (producing dir11 for file11) instead of the intended dir1/dir2
layout; in url_to_url, after matching dir_num = dir_num_match.group(1) compute
the directory component from the leading digit (e.g., dir_index = dir_num[0] or
dir_index = str(int(dir_num) // 10 for multi-digit rules) ) and return
f"gs://my-bucket/data/dir{dir_index}/file{dir_num}.txt" so file11/file12 map to
dir1 and file21/file22 map to dir2; adjust the computation approach in the
url_to_url function to derive dir_index from the matched number.
- Around line 273-284: The example incorrectly calls out a named cache as
"ephemeral" while constructing FileCache(cache_name='mycache'), which by default
is permanent; update the example so the behavior matches the comment by either:
(A) removing cache_name to create an unnamed ephemeral cache (FileCache() or
FileCache(cache_name=None)) or (B) explicitly set delete_on_exit=True when using
a named cache (FileCache(cache_name='mycache', delete_on_exit=True)); also
adjust the surrounding comment text to match the chosen constructor (mention
that named caches persist unless delete_on_exit=True and warn about concurrent
processes if manual deletion via FileCache.delete_cache() is used).

In @docs/introduction.rst:
- Around line 350-379: Update the docs for FileCache.modification_time to
reflect actual behavior: state that missing or inaccessible files raise
FileNotFoundError (or raise an exception) by default unless the method is called
with exception_on_fail=False, which returns the exception object instead;
reserve the description that “None is returned” only for cases where the remote
exists but no modification time is available (e.g., remote does not support
mtime). Also mention that bypass_cache=True forces querying the remote but does
not change the exception_on_fail behavior.
- Around line 421-451: Update the "Deleting a File" section to reflect the new
behavior in FileCache.unlink (and its helper _unlink_multi): state clearly that
local cached copies are always removed (they are deleted in a finally block)
even if the remote unlink fails, that remote failures are surfaced as exceptions
or as Exception objects in the returned list when exception_on_fail=False, and
that the returned list preserves input order (mixing Path and Exception objects
as described); adjust the wording that previously implied the cache is kept on
remote failure and ensure examples and the paragraph about exception_on_fail
reflect that the cache will not be preserved on failure.
- Around line 130-156: Update the docs to reflect current behavior: state that
when FileCache.time_sensitive is True, FileCache.retrieve always performs a
fresh remote modification time check by calling modification_time(...,
bypass_cache=True) regardless of cached metadata; explain that
cache_metadata=True mainly speeds up calls to modification_time (when
bypass_cache=False) and directory methods like FileCache.iterdir and
FileCache.iterdir_metadata which populate the metadata cache and improve
subsequent metadata queries, but do not prevent retrieve() from forcing a remote
mtime lookup in time-sensitive mode.

In @filecache/file_cache_source.py:
- Around line 250-263: The boolean positional parameter preserve_mtime on
retrieve, retrieve_multi, upload and upload_multi makes callers prone to
accidental positional booleans; make preserve_mtime keyword-only by updating
each method signature (e.g., retrieve, retrieve_multi, upload, upload_multi) to
put a lone * before preserve_mtime (or otherwise declare it as a keyword-only
parameter) so callers must use preserve_mtime=...; keep the default value and
types the same and adjust any internal calls if they passed it positionally.

In @filecache/file_cache.py:
- Around line 1397-1404: The multi-file retrieve paths (_retrieve_multi_unlocked
and _retrieve_multi_locked) call modification_time(urls, bypass_cache=True) with
exception_on_fail=True which will abort the whole multi-retrieve on any remote
error; change those calls to pass exception_on_fail=False so failures return
Exception/None instead of raising, then normalize any Exception results in
source_times to None (treat as “no mtime / fall back to cache”) before
proceeding with retrieval logic; update the code locations that build urls and
set source_times (the source_times variable and its use) accordingly so a
missing remote mtime does not abort the entire operation.
- Around line 1327-1363: In _retrieve_single, when self._time_sensitive is True,
avoid letting modification_time(url, bypass_cache=True) raise and abort when a
local cached file exists; instead call modification_time(url, bypass_cache=True,
exception_on_fail=False) or wrap the call in a try/except and treat any
exception as source_time = None so the method falls back to the existing local
copy; apply the same pattern to the multi-URL branch(s) that also call
modification_time so failures to fetch remote mtime won’t force a download/raise
when a valid cache is present.
- Around line 1450-1456: Fix the typo and clarify the comment near the
multi-retrieve optimization: replace “wan't” with “want” and rephrase to
explicitly state that we rely on the mtimes already fetched and set local mtimes
ourselves (so preserve_mtime=False is safe). Update the comment immediately
above the source.retrieve_multi call that passes preserve_mtime=False (and the
similar comment at the other occurrence) to read something like “we already
retrieved modification_times earlier and rely on those mtimes to set local file
timestamps ourselves, so preserve_mtime=False avoids extra server round trips.”
Ensure both instances around preserve_mtime=False/source.retrieve_multi are
updated.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
docs/introduction.rst (1)

1053-1057: Minor reST markup issue for pathlib.Path.is_dir.

The sentence uses `` `pathlib.Path.is_dir``` (one backtick then two), which confuses Sphinx. It should be formatted as a literal with matching backticks, e.g.:

Unlike ``os.path.isdir`` or ``pathlib.Path.is_dir``, ...
filecache/file_cache.py (5)

150-165: time_sensitive / cache_metadata constructor docs are out of date.

The __init__ docstring still says:

  • Uploads with time_sensitive=True set the local file’s mtime from the remote after upload.
  • cache_metadata=True lets retrieve() read mtimes from the metadata cache when comparing local vs remote.

Current behavior is different:

  • upload() never modifies the local file’s mtime; with time_sensitive=True it sets the remote mtime from the local file and otherwise leaves local untouched.
  • retrieve() calls modification_time(..., bypass_cache=True) when time_sensitive=True, so it always performs a fresh remote mtime lookup and does not consult the metadata cache for this comparison.

Please update these parameter docs to match the new semantics (remote mtime from local for uploads, no local mtime changes, and metadata cache only used when bypass_cache=False on modification_time() and related calls).


829-885: modification_time docstring should mirror bypass_cache and exception behavior.

The docstring’s “Returns” section still implies that cache_metadata=True always allows cached mtimes to be used and that failures simply yield None, while the code:

  • Only uses the metadata cache when bypass_cache=False, and
  • Raises on errors when exception_on_fail=True, returning an Exception object only when exception_on_fail=False.

Consider tightening this text to:

  • Explicitly mention that bypass_cache=True forces a remote lookup even when metadata is cached, and
  • Clarify that missing/inaccessible files raise by default (rather than returning None).

1780-1785: upload() docs and logging still talk about updating local mtimes.

  • The upload method’s docstring “Notes” say that with time_sensitive=True the local file’s mtime is set from the remote after upload. Implementation now only passes preserve_mtime=self._time_sensitive to the sources and never touches the local file’s mtime.
  • In _upload_multi, the if self._time_sensitive: branch logs “Updating modification times of local files” but now only clears metadata cache entries; it no longer adjusts any local timestamps.

Recommend:

  • Updating the docstring to describe how remote mtimes are set (from local time when time_sensitive=True, from upload time otherwise) and that local mtimes are never modified, and
  • Renaming the _upload_multi log message to something like “Invalidating cached modification times for uploaded files”.

Also applies to: 1815-1827, 1895-1914


2212-2215: unlink docstring notes contradict the new always-remove-local behavior.

The “Notes” section still states that, for remote URLs, the cached copy is removed only if the remote unlink succeeds. The implementation now unconditionally deletes the cached file in the finally block, regardless of whether the remote unlink raised.

Please update this note to match the current semantics.


2316-2335: Bug in _unlink_multi: wrong URLs reported for failed remote unlinks.

In _unlink_multi, the loop:

for ret2, local_path, url in zip(rets, source_local_paths, urls):
    ...
    if isinstance(ret2, Exception):
        self._log_debug(f'    Remote unlink failed: {url} {ret2}')
        files_not_exist.append(str(url))
    local_path.unlink(missing_ok=True)

zips rets and source_local_paths (per-prefix) with the entire urls list. This misaligns URLs with results when multiple prefixes or local entries are present, so the logged URL and the files_not_exist entries can refer to the wrong paths.

You likely meant to use only the URLs corresponding to source_idxes for this prefix.

Suggested fix
-            rets = source.unlink_multi(source_sub_paths, missing_ok=missing_ok,
-                                       nthreads=nthreads)
+            rets = source.unlink_multi(source_sub_paths, missing_ok=missing_ok,
+                                       nthreads=nthreads)
             assert len(source_idxes) == len(rets)
-            for ret2, local_path, url in zip(rets, source_local_paths, urls):
+            for ret2, local_path, idx in zip(rets, source_local_paths, source_idxes):
+                url = urls[idx]
                 if isinstance(ret2, Exception):
                     self._log_debug(f'    Remote unlink failed: {url} {ret2}')
                     files_not_exist.append(str(url))
                 local_path.unlink(missing_ok=True)  # Remove from cache
filecache/file_cache_source.py (1)

1515-1571: S3 mtime preservation path looks correct; minor style improvement possible.

For S3 you:

  • Prefer custom Metadata['mtime'] in modification_time and fall back to LastModified.timestamp(), and
  • On upload, conditionally set ExtraArgs['Metadata'] = {'mtime': ...} and call upload_file appropriately.

Behaviorally this matches the intended “preserve original local mtime or use remote LastModified” semantics.

If you care about lint noise, the if 'Metadata' in response and response['Metadata']: check could be simplified to metadata = response.get('Metadata') and then testing metadata before reading from it, but that’s purely stylistic.

Also applies to: 1594-1609

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2589e65 and 6ae0bab.

📒 Files selected for processing (4)
  • docs/examples.rst
  • docs/introduction.rst
  • filecache/file_cache.py
  • filecache/file_cache_source.py
🧰 Additional context used
🧬 Code graph analysis (1)
filecache/file_cache.py (1)
filecache/file_cache_source.py (18)
  • modification_time (144-146)
  • modification_time (548-552)
  • modification_time (755-791)
  • modification_time (1109-1133)
  • modification_time (1436-1464)
  • modification_time (1803-1807)
  • retrieve (251-255)
  • retrieve (564-596)
  • retrieve (824-889)
  • retrieve (1177-1247)
  • retrieve (1515-1572)
  • retrieve (1819-1862)
  • unlink (417-434)
  • unlink (666-686)
  • unlink (1025-1044)
  • unlink (1338-1365)
  • unlink (1676-1703)
  • unlink (1951-1975)
🪛 Ruff (0.14.10)
filecache/file_cache.py

937-937: Boolean-typed positional argument in function definition

(FBT001)


938-938: Boolean-typed positional argument in function definition

(FBT001)

filecache/file_cache_source.py

254-254: Boolean-typed positional argument in function definition

(FBT001)


254-254: Boolean default positional argument in function definition

(FBT002)


303-303: Boolean-typed positional argument in function definition

(FBT001)


310-310: Boolean-typed positional argument in function definition

(FBT001)


329-329: Boolean-typed positional argument in function definition

(FBT001)


329-329: Boolean default positional argument in function definition

(FBT002)


370-370: Boolean-typed positional argument in function definition

(FBT001)


377-377: Boolean-typed positional argument in function definition

(FBT001)


567-567: Boolean-typed positional argument in function definition

(FBT001)


567-567: Boolean default positional argument in function definition

(FBT002)


567-567: Unused method argument: preserve_mtime

(ARG002)


601-601: Boolean-typed positional argument in function definition

(FBT001)


601-601: Boolean default positional argument in function definition

(FBT002)


601-601: Unused method argument: preserve_mtime

(ARG002)


827-827: Boolean-typed positional argument in function definition

(FBT001)


827-827: Boolean default positional argument in function definition

(FBT002)


894-894: Boolean-typed positional argument in function definition

(FBT001)


894-894: Boolean default positional argument in function definition

(FBT002)


1180-1180: Boolean-typed positional argument in function definition

(FBT001)


1180-1180: Boolean default positional argument in function definition

(FBT002)


1243-1244: Avoid specifying long messages outside the exception class

(TRY003)


1252-1252: Boolean-typed positional argument in function definition

(FBT001)


1252-1252: Boolean default positional argument in function definition

(FBT002)


1454-1454: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


1518-1518: Boolean-typed positional argument in function definition

(FBT001)


1518-1518: Boolean default positional argument in function definition

(FBT002)


1577-1577: Boolean-typed positional argument in function definition

(FBT001)


1577-1577: Boolean default positional argument in function definition

(FBT002)


1597-1597: Avoid specifying long messages outside the exception class

(TRY003)


1822-1822: Boolean-typed positional argument in function definition

(FBT001)


1822-1822: Boolean default positional argument in function definition

(FBT002)


1867-1867: Boolean-typed positional argument in function definition

(FBT001)


1867-1867: Boolean default positional argument in function definition

(FBT002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Test filecache (ubuntu-latest, 3.11)
  • GitHub Check: Test filecache (macos-latest, 3.9)
  • GitHub Check: Test filecache (windows-latest, 3.11)
  • GitHub Check: Test filecache (ubuntu-latest, 3.9)
  • GitHub Check: Test filecache (windows-latest, 3.10)
  • GitHub Check: Test filecache (macos-latest, 3.12)
  • GitHub Check: Test filecache (macos-latest, 3.10)
  • GitHub Check: Test filecache (ubuntu-latest, 3.12)
  • GitHub Check: Test filecache (macos-latest, 3.11)
  • GitHub Check: Test filecache (windows-latest, 3.12)
  • GitHub Check: Test filecache (windows-latest, 3.13)
  • GitHub Check: Test filecache (macos-latest, 3.13)
  • GitHub Check: Test filecache (ubuntu-latest, 3.10)
  • GitHub Check: Test filecache (windows-latest, 3.9)
  • GitHub Check: Test filecache (ubuntu-latest, 3.13)
🔇 Additional comments (4)
filecache/file_cache.py (2)

77-92: Logger timestamp format change looks correct.

The updated formatter now uses a period before milliseconds (...%H:%M:%S.%(msecs)03d...), which matches the objective to replace the comma with a dot. No functional issues spotted.


442-461: URL normalization now collapses // inside paths; verify impact on schemes with opaque keys.

_split_url now replaces repeated / in both local paths and the sub-path portion of URLs. This satisfies the goal of collapsing redundant slashes, but for backends like S3/GS where object keys can legally contain //, s3://bucket/a//b will now be treated as a/b.

If your external object naming never relies on multiple consecutive slashes, this is fine. Otherwise, consider whether this normalization should be limited to FCPath joins rather than applied to arbitrary user-supplied URLs.

filecache/file_cache_source.py (1)

1819-1861: Fake-source mtime preservation semantics are consistent with other backends.

For FileCacheSourceFake, retrieve and upload now optionally copy mtimes via os.path.getmtime and os.utime, mirroring the preserve-mtime behavior of other sources. The use of temporary paths and rename keeps operations atomic.

No issues spotted here.

Also applies to: 1864-1905

docs/introduction.rst (1)

451-541: Consider documenting FCPath.splitpath alongside other FCPath features.

The introduction covers FCPath construction, / joining, inheritance of FileCache settings, and iteration APIs, but the new FCPath.splitpath helper isn’t mentioned here. A short subsection and minimal example (e.g., splitting a URL-like path at a directory name into head/tail FCPaths) would make this feature discoverable in the main conceptual docs.

Likely an incorrect or invalid review comment.

@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 75.20000% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.55%. Comparing base (8dae7f0) to head (87e4b9c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
filecache/file_cache_source.py 63.15% 13 Missing and 8 partials ⚠️
filecache/file_cache.py 77.27% 8 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #58      +/-   ##
==========================================
- Coverage   90.51%   89.55%   -0.97%     
==========================================
  Files           5        5              
  Lines        2373     2413      +40     
  Branches      493      507      +14     
==========================================
+ Hits         2148     2161      +13     
- Misses        129      149      +20     
- Partials       96      103       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
filecache/file_cache.py (3)

1163-1411: Time-sensitive retrieve() does not propagate anonymous/url_to_url into modification_time calls.

In _retrieve_single and the multi-retrieve helpers, self.modification_time(...) is called without passing the per-call anonymous and url_to_url overrides that were used to construct source/sub_path. This can lead to:

  • Using different credentials for the mtime lookup than for the data download (e.g., retrieve(..., anonymous=True) but modification_time() uses self._anonymous), and
  • Re-running URL translation differently for the mtime check than for the actual download when callers pass a custom url_to_url to retrieve().

Both can cause incorrect failures or comparison against the wrong remote object in time-sensitive mode.

Sketch of a safer propagation

One way to keep behavior consistent is to thread the overrides through:

 def retrieve(self,
              url: StrOrPathOrSeqType,
              *,
              anonymous: Optional[bool] = None,
@@
-        if isinstance(url, (list, tuple)):
+        if isinstance(url, (list, tuple)):
@@
-            if self.is_mp_safe:
-                return self._retrieve_multi_locked(sources, sub_paths, local_paths,
-                                                   lock_timeout, nthreads,
-                                                   exception_on_fail)
-            return self._retrieve_multi_unlocked(sources, sub_paths, local_paths,
-                                                 nthreads, exception_on_fail)
+            if self.is_mp_safe:
+                return self._retrieve_multi_locked(
+                    sources, sub_paths, local_paths,
+                    lock_timeout, nthreads, exception_on_fail,
+                    anonymous=anonymous, url_to_url=url_to_url,
+                )
+            return self._retrieve_multi_unlocked(
+                sources, sub_paths, local_paths,
+                nthreads, exception_on_fail,
+                anonymous=anonymous, url_to_url=url_to_url,
+            )
@@
-        return self._retrieve_single(url, source, sub_path, local_path, lock_timeout,
-                                     self.is_mp_safe, exception_on_fail)
+        return self._retrieve_single(
+            url, source, sub_path, local_path, lock_timeout,
+            self.is_mp_safe, exception_on_fail,
+            anonymous=anonymous, url_to_url=url_to_url,
+        )

and in _retrieve_single:

-    def _retrieve_single(self,
-                         url: str,
-                         source: FileCacheSource,
-                         sub_path: str,
-                         local_path: Path,
-                         lock_timeout: int,
-                         locking: bool,
-                         exception_on_fail: bool) -> Path | Exception:
+    def _retrieve_single(self,
+                         url: str,
+                         source: FileCacheSource,
+                         sub_path: str,
+                         local_path: Path,
+                         lock_timeout: int,
+                         locking: bool,
+                         exception_on_fail: bool,
+                         *,
+                         anonymous: Optional[bool] = None,
+                         url_to_url: Optional[UrlToUrlFuncOrSeqType] = None,
+                         ) -> Path | Exception:
@@
-                if self._time_sensitive:
+                if self._time_sensitive:
                     try:
-                        source_time = cast(float | None, self.modification_time(url))
+                        source_time = cast(
+                            float | None,
+                            self.modification_time(
+                                url,
+                                anonymous=anonymous,
+                                url_to_url=url_to_url,
+                            ),
+                        )

Similarly, _retrieve_multi_unlocked / _retrieve_multi_locked can accept anonymous and url_to_url and pass them through when calling self.modification_time(urls, ...). This keeps the mtime checks aligned with whatever credentials and URL mapping the caller requested for the actual data transfer.


2215-2244: unlink() docstring Notes no longer match behavior (local cache is always removed).

The Notes still say the cached copy is removed “only if the unlink of the remote location succeeded”, but the implementation now removes local_path in a finally block regardless of remote success.

Suggested doc update
-        Notes:
-            If a URL points to a remote location, the locally cached version (if any) is
-            only removed if the unlink of the remote location succeeded.
+        Notes:
+            If a URL points to a remote location, any locally cached copy is always
+            removed, even if the remote unlink fails. Remote failures are surfaced via
+            exceptions or as :class:`Exception` objects in the returned list (when
+            ``exception_on_fail=False``).

2293-2335: Multi-unlink error reporting zips against the full urls list, not the per-prefix subset.

In _unlink_multi, the loop:

for ret2, local_path, url in zip(rets, source_local_paths, urls):
    if isinstance(ret2, Exception):
        self._log_debug(f'    Remote unlink failed: {url} {ret2}')
        files_not_exist.append(str(url))
    local_path.unlink(missing_ok=True)

uses the entire urls list instead of the indices (source_idxes) for this source_pfx. This can associate failures with the wrong URLs in logs and in the aggregated error message.

Proposed fix
-            rets = source.unlink_multi(source_sub_paths, missing_ok=missing_ok,
-                                       nthreads=nthreads)
+            rets = source.unlink_multi(source_sub_paths, missing_ok=missing_ok,
+                                       nthreads=nthreads)
             assert len(source_idxes) == len(rets)
-            for ret2, local_path, url in zip(rets, source_local_paths, urls):
+            source_urls = [urls[i] for i in source_idxes]
+            for ret2, local_path, url in zip(rets, source_local_paths, source_urls):
                 if isinstance(ret2, Exception):
                     self._log_debug(f'    Remote unlink failed: {url} {ret2}')
                     files_not_exist.append(str(url))
                 local_path.unlink(missing_ok=True)  # Remove from cache
filecache/file_cache_source.py (1)

1688-1704: Fix copy-paste docstring in FileCacheSourceS3.unlink (“Google Storage bucket”).

The parameter description for sub_path still says “Google Storage bucket”, which is misleading for the S3 implementation.

Proposed doc fix
-        Parameters:
-            sub_path: The path of the file in the Google Storage bucket given by the
-                source prefix to delete.
+        Parameters:
+            sub_path: The path of the file in the AWS S3 bucket given by the
+                source prefix to delete.
🤖 Fix all issues with AI Agents
In @docs/examples.rst:
- Around line 170-178: The example uses re.match inside the url_to_url function
but omits the import; add an import re statement near the top of the example (so
url_to_url can call re.match) to make the snippet copy-paste runnable and retain
the existing logic that checks path with re.match(r"data/file(\d+)\.txt").
- Around line 221-237: Update the example comments to reflect the actual cache
subdirectory name used when remote == "my-bucket": change references from
"gs_bucket" to "gs_my-bucket" (or otherwise match cache_subdir semantics).
Specifically, adjust the two comment lines around the FileCache.retrieve
examples so they show the stored paths using "gs_my-bucket" (and the custom
url_to_path behavior using cache_subdir) to match the url_to_path function,
FileCache(url_to_path=url_to_path), and the retrieve calls.

In @docs/introduction.rst:
- Around line 350-380: Update the documentation for FileCache.modification_time
to match actual behavior: state that for missing or inaccessible files it raises
by default but if called with exception_on_fail=False it returns the exception
object (instead of raising), and that when the remote exists but offers no mtime
(e.g., HTTP without Last-Modified) the method returns None; also keep existing
notes about cache_metadata and bypass_cache and clarify returned list ordering
for multiple URLs.
- Around line 451-541: Add documentation for the new FCPath.splitpath helper and
the FCPath.filecache property in the FCPath section: describe splitpath (what it
returns and example usage for splitting a path into (parent, name) or similar)
and document that FCPath.filecache exposes the associated FileCache instance
(show how to access it and note that methods can be invoked via
fc_path.filecache). Place both short descriptions alongside the existing
constructor and /-joining paragraphs so users learn these APIs when reading
about FCPath construction and usage.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ae0bab and 279dc63.

📒 Files selected for processing (4)
  • docs/examples.rst
  • docs/introduction.rst
  • filecache/file_cache.py
  • filecache/file_cache_source.py
🧰 Additional context used
🧬 Code graph analysis (1)
filecache/file_cache.py (2)
filecache/file_cache_path.py (6)
  • is_file (1395-1398)
  • modification_time (847-918)
  • stat (1699-1711)
  • retrieve (982-1103)
  • upload (1105-1203)
  • unlink (1290-1376)
filecache/file_cache_source.py (24)
  • modification_time (144-146)
  • modification_time (550-554)
  • modification_time (759-795)
  • modification_time (1115-1139)
  • modification_time (1445-1473)
  • modification_time (1814-1818)
  • retrieve (251-256)
  • retrieve (566-599)
  • retrieve (828-894)
  • retrieve (1183-1254)
  • retrieve (1524-1582)
  • retrieve (1830-1874)
  • upload (327-332)
  • upload (601-630)
  • upload (896-903)
  • upload (1256-1294)
  • upload (1584-1622)
  • upload (1876-1919)
  • unlink (419-436)
  • unlink (670-690)
  • unlink (1031-1050)
  • unlink (1347-1374)
  • unlink (1687-1714)
  • unlink (1964-1988)
🪛 Ruff (0.14.10)
filecache/file_cache.py

937-937: Boolean-typed positional argument in function definition

(FBT001)


938-938: Boolean-typed positional argument in function definition

(FBT001)

filecache/file_cache_source.py

304-304: Boolean-typed positional argument in function definition

(FBT001)


311-311: Boolean-typed positional argument in function definition

(FBT001)


372-372: Boolean-typed positional argument in function definition

(FBT001)


379-379: Boolean-typed positional argument in function definition

(FBT001)


570-570: Unused method argument: preserve_mtime

(ARG002)


605-605: Unused method argument: preserve_mtime

(ARG002)


1250-1251: Avoid specifying long messages outside the exception class

(TRY003)


1463-1463: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


1608-1608: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Test filecache (windows-latest, 3.11)
  • GitHub Check: Test filecache (windows-latest, 3.10)
  • GitHub Check: Test filecache (ubuntu-latest, 3.11)
  • GitHub Check: Test filecache (macos-latest, 3.10)
  • GitHub Check: Test filecache (windows-latest, 3.9)
  • GitHub Check: Test filecache (ubuntu-latest, 3.13)
  • GitHub Check: Test filecache (windows-latest, 3.13)
  • GitHub Check: Test filecache (macos-latest, 3.11)
  • GitHub Check: Test filecache (windows-latest, 3.12)
  • GitHub Check: Test filecache (macos-latest, 3.12)
  • GitHub Check: Test filecache (macos-latest, 3.13)
  • GitHub Check: Test filecache (ubuntu-latest, 3.12)
  • GitHub Check: Test filecache (macos-latest, 3.9)
  • GitHub Check: Test filecache (ubuntu-latest, 3.10)
  • GitHub Check: Test filecache (ubuntu-latest, 3.9)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @docs/examples.rst:
- Around line 170-178: The example uses re.match inside the url_to_url function
but never imports the re module; add an import re statement at the top of the
example (so the url_to_url function can call re.match without NameError). Ensure
the import appears before the function definition in the same code-block
example.

In @tests/test_file_cache.py:
- Around line 2189-2196: The loop in tests/test_file_cache.py uses an unused
variable `p` in `for lp, p in zip(local_paths, paths):` which reduces clarity;
change the loop to only iterate `for lp in local_paths:` (or use `for lp, _ in
zip(local_paths, paths):` if you want to keep the zip intent) and update the
similar occurrences mentioned (lines 2197, 2286, 2294) so the unused variable is
removed or replaced with `_`.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 279dc63 and d5da0bf.

📒 Files selected for processing (5)
  • .github/workflows/run-tests.yml
  • docs/examples.rst
  • docs/introduction.rst
  • pyproject.toml
  • tests/test_file_cache.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_file_cache.py (1)
filecache/file_cache.py (4)
  • FileCache (103-2498)
  • get_local_path (565-654)
  • unlink (2144-2272)
  • modification_time (829-931)
🪛 Ruff (0.14.10)
tests/test_file_cache.py

2189-2189: Loop control variable p not used within loop body

(B007)


2189-2189: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


2197-2197: Loop control variable p not used within loop body

(B007)


2197-2197: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


2286-2286: Loop control variable p not used within loop body

(B007)


2286-2286: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


2294-2294: Loop control variable p not used within loop body

(B007)


2294-2294: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Test filecache (macos-latest, 3.11)
  • GitHub Check: Test filecache (windows-latest, 3.13)
  • GitHub Check: Test filecache (ubuntu-latest, 3.10)
  • GitHub Check: Test filecache (windows-latest, 3.10)
  • GitHub Check: Test filecache (windows-latest, 3.11)
  • GitHub Check: Test filecache (windows-latest, 3.12)
  • GitHub Check: Test filecache (macos-latest, 3.13)
  • GitHub Check: Test filecache (ubuntu-latest, 3.13)
  • GitHub Check: Test filecache (ubuntu-latest, 3.11)
  • GitHub Check: Test filecache (macos-latest, 3.10)
  • GitHub Check: Test filecache (ubuntu-latest, 3.12)
  • GitHub Check: Test filecache (macos-latest, 3.12)
🔇 Additional comments (7)
.github/workflows/run-tests.yml (1)

46-46: LGTM! Python version matrix updated correctly.

The removal of Python 3.9 from the test matrix aligns with the minimum version requirement change in pyproject.toml.

pyproject.toml (1)

10-10: LGTM! Minimum Python version updated correctly.

The change to require Python 3.10+ is consistent with the workflow update and properly documented in the project metadata.

docs/examples.rst (1)

1-592: Examples documentation is comprehensive and well-structured.

The 21 examples provide excellent coverage of FileCache and FCPath usage patterns, including ephemeral/persistent caches, time-sensitive behavior, parallel operations, URL translations, and directory traversal. The organization progresses logically from basic to advanced use cases.

docs/introduction.rst (1)

1-556: Excellent comprehensive introduction documentation.

This introduction provides thorough coverage of FileCache concepts including cache location, lifetime, timestamps, multi-processor safety, parallelism, and URL translation. The organization is logical and the explanations are clear and detailed.

tests/test_file_cache.py (3)

832-839: LGTM! Path normalization test is well-designed.

This test correctly verifies that paths with multiple consecutive slashes are normalized to the same local path, which aligns with the PR objective to collapse redundant slashes (fixes #53).


2142-2170: Updated unlink assertions correctly reflect new behavior.

The assertions at lines 2153, 2158, 2168 correctly verify that local cached files are removed even when remote unlink operations fail. This aligns with the PR description stating that local copies are always removed in a finally block.


2796-2865: Comprehensive test coverage for modification time caching behavior.

This parameterized test thoroughly validates the interplay between time_sensitive, cache_metadata, and bypass_cache parameters across multiple scenarios. It correctly verifies:

  • Remote mtime preservation based on time_sensitive
  • Metadata cache behavior with bypass_cache
  • Local file mtime handling during retrieve operations
  • Re-download behavior when remote files change

The test design effectively covers the new modification time handling described in the PR objectives.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In @tests/test_file_cache.py:
- Around line 2856-2866: The S3-specific skip in the test around
prefix.startswith('s3:') is unclear; update the test comment and actions to
document and triage the cause: either (A) expand the inline comment to state the
expected S3 behavior (e.g., S3 object timestamp semantics, precision or eventual
consistency) and reference the related feature/flag (preserve_mtime) and
variables used (time_sensitive, lp.stat().st_mtime, mtime_new), or (B) replace
the bare skip with a brief TODO and open/attach an issue ID that investigates
whether preserve_mtime is implemented for S3 or if the failure is a timing race;
ensure the test code clearly states which of the three hypotheses (S3 semantics,
missing preserve_mtime, timing issue) is being assumed until the issue is
resolved.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5da0bf and 87e4b9c.

📒 Files selected for processing (1)
  • tests/test_file_cache.py
🧰 Additional context used
🪛 Ruff (0.14.10)
tests/test_file_cache.py

2197-2197: Loop control variable p not used within loop body

(B007)


2197-2197: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


2286-2286: Loop control variable p not used within loop body

(B007)


2286-2286: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


2294-2294: Loop control variable p not used within loop body

(B007)


2294-2294: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Test filecache (ubuntu-latest, 3.11)
  • GitHub Check: Test filecache (windows-latest, 3.10)
  • GitHub Check: Test filecache (macos-latest, 3.12)
  • GitHub Check: Test filecache (ubuntu-latest, 3.12)
  • GitHub Check: Test filecache (macos-latest, 3.11)
  • GitHub Check: Test filecache (macos-latest, 3.13)
  • GitHub Check: Test filecache (ubuntu-latest, 3.10)
  • GitHub Check: Test filecache (windows-latest, 3.12)
  • GitHub Check: Test filecache (windows-latest, 3.13)
  • GitHub Check: Test filecache (windows-latest, 3.11)
  • GitHub Check: Test filecache (ubuntu-latest, 3.13)
  • GitHub Check: Test filecache (macos-latest, 3.10)
🔇 Additional comments (2)
tests/test_file_cache.py (2)

832-839: LGTM! Good coverage for path normalization.

This test properly validates that redundant slashes are collapsed across all prefix types, aligning with Issue #53.


2798-2854: LGTM! Comprehensive test for bypass_cache and time_sensitive interactions.

The test correctly validates:

  • bypass_cache=True forces remote lookup regardless of cache_metadata setting
  • time_sensitive=True causes local file mtime to match remote mtime
  • time_sensitive=False preserves local mtime as download time
  • Parameterization over WRITABLE_CLOUD_PREFIXES ensures both cloud providers are tested

The unique_id pattern prevents test pollution across runs.

@rfrenchseti rfrenchseti merged commit 3ecac5a into main Jan 7, 2026
14 of 16 checks passed
@rfrenchseti rfrenchseti deleted the rf_251218 branch January 7, 2026 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants