Skip to content

Conversation

@joaquincasares
Copy link
Contributor

Fix ConcurrentModificationException in performGarbageCollection with only_purge_repaired_tombstones

When running nodetool garbagecollect on a table with mixed repaired/unrepaired SSTables and only_purge_repaired_tombstones=true, a ConcurrentModificationException was thrown because the code iterated directly over transaction.originals() while calling transaction.cancel() inside the loop, modifying the underlying collection.

The fix copies the collection before iterating: new ArrayList<>(transaction.originals())

Tests run via:

TEST_NAME_REGEX=GarbageCollectRepairedSSTablesTest docker compose up --build

Copy link
Member

@michaelsembwever michaelsembwever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just nits for readability (and redundant docker approach)

@joaquincasares
Copy link
Contributor Author

lgtm, just nits for readability (and redundant docker approach)

@michaelsembwever thank you for the review! All requested changes have been implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above, and against the test methods there's repeated mention of the ConcurrentModificationException.

but that's not the only value of these test methods. and four of these methods don't even break without the patch, so have nothing to do with the ConcurrentModificationException bug– the test methods still add value, but their value-add can be documented more accurately.

Comment on lines 351 to 354
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious) do you know how to reproduce this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaquincasares
Copy link
Contributor Author

@michaelsembwever , I've made a few changes to the PR that should be clearer now. 🤞

Essentially, I was testing that my change didn't break other more basic cases on the more often beaten path. If you want, we can definitely remove them, but I figure they would be good test cases to have for future changes in either case.

Regarding the remaining three comments, I ran the same test file with and without the fix within CompactionManager.java. On the previous release, we ended up having issues only on those tests that were looking for leaks. When I applied the fix, those tests passed.

testLeakDetectionWithAwaitility and testNoLeakDetectionWithAwaitility are less testing Cassandra and more testing the method of the fix to ensure that the fix was ideal. These are probably less valuable to include in the codebase, but I included them so we had them for the review at the very least.

Do let me know if you want me to remove any specific test cases, since I am aware of test-bloat, but figured to include them so we had them during the review process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants