Skip to content

Conversation

@pdames
Copy link
Member

@pdames pdames commented Dec 1, 2025

Summary

These changes resolve #585 by ensuring that only metafile revisions with the same revision number are counted against DeltaCAT's max concurrent write conflict limit (vs. prior behavior of treating all revisions in a revision directory partition as potential conflicts). All changes required to resolve #585 are in metafile.py, with corresponding tests in test_metafile_io.py.

Other Changes (not directly related to #585):

  1. Tests to ensure that manifest entry order is preserved during delta download and merge operations.
  2. Fix concurrent log file rotation and log file sizing bugs in the new rolling log file appender (Updated rolling log file appenders to have 64MiB cutoff and compress … #579).
  3. Flaky test hardening (test_file_object_store.py, test_default_catalog_impl.py: use test-level mocks over conflicting class-level mocks, resolve signal timing bugs, thread naming race conditions, and add fallback to handle unexpected failures of concurrent test case writers).

FYI @IvanPartsunev.

Testing

Wrote failing unit tests and ensured that they passed post-update.

Checklist

  • Unit tests covering the changes have been added

    • If this is a bugfix, regression tests have been added
  • E2E testing has been performed

@pdames pdames requested a review from Zyiqin-Miranda December 1, 2025 18:03
_safe_rename(source, dest)

# Rotate current log to log.1
dest = self.rotation_filename(f"{self.baseFilename}.1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that the current log will always be log.1? Seem like the for i in range(self.backupCount - 1, 0, -1): will have log.1 as well during the for loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is intentional since the previously loop should have vacated the log.1 slot (due to excluding the ending range of 0 from inclusion in the loop) which is populated here by the current log.

For example, if backupCount = 10, then we get: [9, 8, 7, 6, 5, 4, 3, 2, 1] then the above loop should:
Move log.9 -> log.10 (i.e., push the oldest log out)
Move log.8 -> log.9
... and so on down to log.1 ...
Move log.1 -> log.2 (i.e., make room for a new log.1)

So finally here on line 234-235 we move current -> log.1 (now empty)

@pdames pdames merged commit 821ea59 into ray-project:2.0 Dec 1, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ConcurrentModificationError after ~1000 successful writes to same table partition

2 participants