PSMDB-1997: Fix TDE race condition when dropping and recreating database#1778
Draft
PSMDB-1997: Fix TDE race condition when dropping and recreating database#1778
Conversation
This commit fixes a race condition in the TDE (Transparent Data Encryption)
implementation where dropping a database and immediately recreating it with
the same name causes data corruption ('bad decrypt' errors).
Problem:
When a database is dropped, idents are marked for deferred deletion (for
checkpoint-cleanup to finish). If the database is recreated before cleanup
completes, the old and new databases shared the same encryptor instance
because keyid was based solely on database name. When the new key was loaded,
it overwrote the old key, causing checkpoint-cleanup to fail when trying to
decrypt old data.
Solution:
Implement generation-based keyids to ensure old and new database instances
use separate encryptors with separate keys:
- First database 'test' uses keyid='test'
- After drop/recreate: keyid='test.v1', then 'test.v2', etc.
- Generation is persisted in table:parameters (key: 'dbgen:<dbName>')
- O(1) lookup via single cursor search (optimized from O(N) table scan)
Key changes:
- Added getCurrentKeyId() to return versioned keyid for new tables
- Added persistGeneration()/loadGeneration() for O(1) generation lookup
- Modified markDatabaseDropped() to increment generation and persist it
- Updated wiredtiger_customization_hooks.cpp to use versioned keyids
- Updated import_data_from() to copy generation data during key rotation
- Track pending drops by database name with mapping to versioned keyid
The old encryptor (with old key) remains cached for checkpoint-cleanup,
while new collections get a new encryptor with a new key.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit fixes a race condition in the TDE (Transparent Data Encryption) implementation where dropping a database and immediately recreating it with the same name causes data corruption ('bad decrypt' errors).
Problem:
When a database is dropped, idents are marked for deferred deletion (for checkpoint-cleanup to finish). If the database is recreated before cleanup completes, the old and new databases shared the same encryptor instance because keyid was based solely on database name. When the new key was loaded, it overwrote the old key, causing checkpoint-cleanup to fail when trying to decrypt old data.
Solution:
Implement generation-based keyids to ensure old and new database instances use separate encryptors with separate keys:
Key changes:
The old encryptor (with old key) remains cached for checkpoint-cleanup, while new collections get a new encryptor with a new key.