Skip to content

PSMDB-1997: Add deferred encryption key cleanup to avoid race with checkpoint#1786

Open
plebioda wants to merge 4 commits intomasterfrom
PSMDB-1997-fix-sporadic-failures-skip-checkpoint-cleanup
Open

PSMDB-1997: Add deferred encryption key cleanup to avoid race with checkpoint#1786
plebioda wants to merge 4 commits intomasterfrom
PSMDB-1997-fix-sporadic-failures-skip-checkpoint-cleanup

Conversation

@plebioda
Copy link
Copy Markdown

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds infrastructure to defer deletion of per-database encryption keys (Percona KeyDB) and periodically clean up orphaned keys after verifying they are no longer referenced by any WiredTiger ident, to avoid races with checkpoint/drop-pending cleanup.

Changes:

  • Added parsing helper to extract encryption.keyid from WiredTiger metadata configs, with unit tests.
  • Extended the KeyDB/KVEngine API and WiredTigerKVEngine implementation to enumerate keys, delete keys, and scan WT metadata for keyIds currently in use.
  • Added a timestamp-monitor-driven background cleanup in StorageEngineImpl, gated by new server parameters.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/mongo/db/storage/wiredtiger/wiredtiger_util.h Declares helper to extract encryption keyId from WT config strings.
src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp Implements getEncryptionKeyId() via WiredTigerConfigParser.
src/mongo/db/storage/wiredtiger/wiredtiger_util_test.cpp Adds coverage for keyId extraction across config variants.
src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.h Exposes new key-management methods on the engine.
src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp Implements key enumeration/deletion and “keys in use” scan.
src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine_encryption_key_test.cpp Adds tests for key enumeration/deletion + server parameter toggling.
src/mongo/db/storage/wiredtiger/wiredtiger_global_options.idl Introduces encryptionKeyCleanupDeferred + cleanup interval parameters.
src/mongo/db/storage/wiredtiger/encryption_keydb.h Declares KeyDB enumeration API.
src/mongo/db/storage/wiredtiger/encryption_keydb.cpp Implements getAllKeyIds() cursor scan; guards null encryptor pointers.
src/mongo/db/storage/storage_engine_impl.h Adds cleanup listener + last-run tracking.
src/mongo/db/storage/storage_engine_impl.cpp Implements periodic orphaned key cleanup with catalog + WT metadata checks.
src/mongo/db/storage/keydb_api.h Extends KeyDB API surface for deferred cleanup support.
src/mongo/db/shard_role/shard_catalog/collection_catalog_helper.cpp Removes immediate key deletion from dropDatabase path; documents deferred cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@plebioda plebioda force-pushed the PSMDB-1997-fix-sporadic-failures-skip-checkpoint-cleanup branch 5 times, most recently from b2b3f91 to ef48a8e Compare April 2, 2026 11:22
@plebioda plebioda marked this pull request as ready for review April 2, 2026 11:23
@plebioda plebioda requested review from igorsol and ktrushin April 2, 2026 11:23
plebioda added 4 commits April 2, 2026 13:58
…eckpoint

This commit implements deferred encryption key cleanup to prevent race
conditions between encryption key deletion and checkpoint cleanup during
dropDatabase operations.
Add unit tests for the new encryption key management
functionality:
Add two test files and corresponding test suites for verifying
deferred encryption key cleanup functionality:

- deferred_key_cleanup_drop_database.js: Tests basic database
  create/drop operations with deferred cleanup enabled
- deferred_key_cleanup_stress.js: Stress test with 16 parallel
  threads performing 200 iterations each of database create/drop

Test suites run with both AES256-GCM and AES256-CBC cipher modes.

The stress test uses db.getSiblingDB() instead of conn.getDB()
to ensure the database connection works correctly when the worker
function is serialized and executed in a parallel shell context.
Fix Python version to 3.13 to match pyproject.toml requirements.
@plebioda plebioda force-pushed the PSMDB-1997-fix-sporadic-failures-skip-checkpoint-cleanup branch 2 times, most recently from fea52cd to 765829e Compare April 2, 2026 14:09
@plebioda
Copy link
Copy Markdown
Author

plebioda commented Apr 2, 2026

The clang-tidy check fails due to bazel running out of memory:

FATAL: bazel ran out of memory and crashed. Printing stack trace:
java.lang.OutOfMemoryError: Java heap space
	at java.base/jdk.internal.misc.Unsafe.allocateUninitializedArray(Unknown Source)
	at java.base/java.lang.StringConcatHelper.newArray(Unknown Source)
	at java.base/java.lang.StringConcatHelper.simpleConcat(Unknown Source)
	at java.base/java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder)
	at java.base/java.lang.invoke.LambdaForm$MH/0x000000080008c400.invoke(LambdaForm$MH)
	at java.base/java.lang.invoke.Invokers$Holder.linkToTargetMethod(Invokers$Holder)
	at com.google.devtools.build.lib.buildeventstream.transports.JsonFormatFileTransport.serializeEvent(JsonFormatFileTransport.java:55)
	at com.google.devtools.build.lib.buildeventstream.transports.FileTransport$$Lambda/0x00000008004b4dc0.apply(Unknown Source)
	at com.google.devtools.build.lib.buildeventstream.transports.FileTransport$SequentialWriter.run(FileTransport.java:140)
	at java.base/java.lang.Thread.runWith(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants