Skip to content

Fix pg rewind#559

Draft
dAdAbird wants to merge 4 commits intopercona:mainfrom
dAdAbird:fix_pg_rewind
Draft

Fix pg rewind#559
dAdAbird wants to merge 4 commits intopercona:mainfrom
dAdAbird:fix_pg_rewind

Conversation

@dAdAbird
Copy link
Copy Markdown
Member

@dAdAbird dAdAbird commented Mar 31, 2026

The rewind might update bloks on target's file with that of the source.
In that case, encrypted files will end up with different blocks
encrypted with different internal keys, leading to data corruption.
To prevent this, we have to decryp blocks coming from the source and
re-encrypt them with the target's internal key.

Along with that, providers might have been changed, primary keys
rotated, etc., after the divergence of clusters. Generally, it's not a
problem because we would replace the keys and providers on the target
with those of the source. But it will render internal keys that we have
used for the partial blocks unreadable. To fix this, we have to
re-encrypt such internal keys with source's primary keys.

So the sequence is next:

  1. Copy source's pg_tde dir into a tmp location. All following reads
    and writes of source keys and providers happen in this tmp dir.
  2. When we have partial updates, check for the source key. And
    re-encrypt blocks if the key exists.
  3. Save the target key into the source _keys file replacning the
    existing one.
  4. Replace target's pg_tde dir with the tmp one.

See commit messages for additional info.

It's still a draft laking:

  1. Tests.
  2. Refactoring. The 2nd commit should be split in two. Plus I didn't much think on names etc.
  3. Need to clean-up the tmp dir on failure.
  4. We maybe need to merge WAL key as well as rewind sometimes keeps one targets WAL segment from before the diverge. But is not a hard fix but I'm not sure if we need it and I don't want to have useless code sitting around. I need to think more.
  5. We may or may not want to sole issues with _fsm of partially updated relations. Currently they end-up with wrong key (read currpted). But server successfully resolve it by just zeroing corrupted pages: Fixed
2026-03-31 18:57:23.703 UTC [116297] LOG:  page verification failed, calculated checksum 39771 but expected 16415
2026-03-31 18:57:23.703 UTC [116297] CONTEXT:  WAL redo at 0/30D1C70 for Heap/INSERT: off: 28, flags: 0x08; blkref #0: rel 1663/16384/16434, blk 14
2026-03-31 18:57:23.703 UTC [116297] LOG:  invalid page in block 2 of relation "base/16384/16434_fsm"; zeroing out page
2026-03-31 18:57:23.703 UTC [116297] CONTEXT:  WAL redo at 0/30D1C70 for Heap/INSERT: off: 28, flags: 0x08; blkref #0: rel 1663/16384/16434, blk 14
2026-03-31 18:57:23.703 UTC [116297] WARNING:  invalid page in block 2 of relation "base/16384/16434_fsm"; zeroing out page
2026-03-31 18:57:23.703 UTC [116297] CONTEXT:  WAL redo at 0/30D1C70 for Heap/INSERT: off: 28, flags: 0x08; blkref #0: rel 1663/16384/16434, blk 14
2026-03-31 18:57:23.737 UTC [116297] LOG:  completed backup recovery with redo LSN 0/30AB4B8 and end LSN 0/3649278
  1. Ah, and all changes only for pg18 so far.

The rewind might update bloks on target's file with that of the source.
In that case, encrypted files will end up with different blocks
encrypted with different internal keys, leading to data corruption.
To prevent this, we have to decryp blocks coming from the source and
re-encrypt them with the target's internal key.

Along with that, providers might have been changed, primary keys
rotated, etc., after the divergence of clusters. Generally, it's not a
problem because we would replace the keys and providers on the target
with those of the source. But it will render internal keys that we have
used for the partial blocks unreadable. To fix this, we have to
re-encrypt such internal keys with source's primary keys.

So the sequence is next:
1. Copy source's pg_tde dir into a tmp location. All following reads
and writes of source keys and providers happen in this tmp dir.
2. When we have partial updates, check for the source key. And
re-encrypt blocks if the key exists.
3. Save the target key into the source _keys file replacning the
existing one.
4. Replace target's pg_tde dir with the tmp one.
It also fixes an issue when encrypted files might remain untouched by
rewind on the target. That means they are encrypted with the target's
key wich vanishes after the pg_tde replace. So we need to ensure keys
for such files. Meaning if this is an encrypted relation, we save its
target key into the source's keys.
Do what "dry-run" prescribes, everything, but modify the target directory.
We rewrite the internal key for relations when partially re-encrypting
blocks. That makes its FSM and VM fork unreadable as they are still
encrypted with the old key. To fix that, we re-encrypt such forks with
the proper key after we finish processing the main fork file.

As pg_rewind processes files in the order of operation types (see
file_action_t) and whole-file copies occur before any partial writes,
we assume that for files already in the target datadir, we rewrite
them in-place.
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 60.47619% with 83 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.32%. Comparing base (a2ae945) to head (7924f61).
⚠️ Report is 7 commits behind head on main.

❌ Your project status has failed because the head coverage (75.95%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #559      +/-   ##
==========================================
- Coverage   58.85%   57.32%   -1.53%     
==========================================
  Files          69       69              
  Lines       10822    10868      +46     
  Branches     1870     2676     +806     
==========================================
- Hits         6369     6230     -139     
+ Misses       3578     3356     -222     
- Partials      875     1282     +407     
Components Coverage Δ
access 80.67% <ø> (-1.24%) ⬇️
bin 63.76% <ø> (-4.24%) ⬇️
catalog 77.86% <ø> (-6.78%) ⬇️
common 76.47% <ø> (+4.24%) ⬆️
encryption 64.53% <100.00%> (+5.71%) ⬆️
keyring 65.12% <ø> (-2.69%) ⬇️
src 87.33% <ø> (-2.99%) ⬇️
smgr 90.20% <100.00%> (-1.08%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants