Skip to content

fix(notar): epoch handling#8540

Merged
lidatong merged 1 commit intomainfrom
chali/fix/7833
Mar 3, 2026
Merged

fix(notar): epoch handling#8540
lidatong merged 1 commit intomainfrom
chali/fix/7833

Conversation

@lidatong
Copy link
Member

@lidatong lidatong commented Feb 27, 2026

Closes #7833

Copilot AI review requested due to automatic review settings February 27, 2026 03:59
@github-actions
Copy link

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.062339 s 0.061985 s -0.568%
backtest mainnet-368528500-perf snapshot load 2.442 s 2.14 s -12.367%
backtest mainnet-368528500-perf total elapsed 62.33949 s 61.984989 s -0.569%
firedancer mem usage with mainnet.toml 983.37 GiB 983.37 GiB 0.000%

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the notar (notary) epoch handling to be based on the root slot's epoch instead of the completed slot's epoch. The changes simplify the notar data structures by removing dual-epoch tracking (current and previous) and replacing the fd_notar_update_voters function with a new reindex_notar function that is called only when the root slot transitions to a new epoch.

Changes:

  • Added epoch tracking to tower blocks and replaced epoch-per-slot logic with epoch-per-root logic
  • Introduced reindex_notar function to handle voter bit position remapping during epoch transitions
  • Removed previous epoch tracking fields from notar structures (prev_vtrs, prev_stake, prev_bit, epoch field)

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/discof/tower/fd_tower_tile.c Added reindex_notar function and modified epoch transition logic to check root slot epochs instead of replay slot epochs
src/choreo/tower/fd_tower_stakes.h Reformatted struct field alignment (cosmetic change)
src/choreo/tower/fd_tower_blocks.h Added epoch field to fd_tower_blk struct to support epoch-based logic
src/choreo/notar/test_notar.c Commented out test_update_voters test (removed test coverage)
src/choreo/notar/fd_notar.h Removed epoch tracking, prev_vtrs field, and fd_notar_update_voters API; simplified vtr structure
src/choreo/notar/fd_notar.c Removed fd_notar_update_voters implementation and prev_vtrs initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.075011 s 0.074758 s -0.337%
backtest mainnet-368528500-perf snapshot load 4.253 s 3.139 s -26.193%
backtest mainnet-368528500-perf total elapsed 75.010747 s 74.75784 s -0.337%
firedancer mem usage with mainnet.toml 976.37 GiB 983.37 GiB 0.717%

Copilot AI review requested due to automatic review settings February 27, 2026 23:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lidatong lidatong force-pushed the chali/fix/7833 branch 2 times, most recently from 12f0e92 to 36c8454 Compare March 2, 2026 16:38
Copilot AI review requested due to automatic review settings March 2, 2026 16:38
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.071302 s 0.072126 s 1.156%
backtest mainnet-368528500-perf snapshot load 3.202 s 2.781 s -13.148%
backtest mainnet-368528500-perf total elapsed 71.302291 s 72.12577 s 1.155%
firedancer mem usage with mainnet.toml 966.37 GiB 966.37 GiB 0.000%

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.061476 s 0.061576 s 0.163%
backtest mainnet-368528500-perf snapshot load 3.299 s 2.498 s -24.280%
backtest mainnet-368528500-perf total elapsed 61.475617 s 61.575795 s 0.163%
firedancer mem usage with mainnet.toml 966.37 GiB 966.37 GiB 0.000%

Copilot AI review requested due to automatic review settings March 2, 2026 18:04
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.071797 s 0.072484 s 0.957%
backtest mainnet-368528500-perf snapshot load 3.191 s 2.749 s -13.851%
backtest mainnet-368528500-perf total elapsed 71.796544 s 72.484002 s 0.958%
firedancer mem usage with mainnet.toml 966.37 GiB 966.37 GiB 0.000%

@lidatong lidatong force-pushed the chali/fix/7833 branch 2 times, most recently from 7a555ac to 25622f4 Compare March 2, 2026 18:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.084781 s 0.084196 s -0.690%
backtest mainnet-368528500-perf snapshot load 3.865 s 3.336 s -13.687%
backtest mainnet-368528500-perf total elapsed 84.781056 s 84.196239 s -0.690%
firedancer mem usage with mainnet.toml 966.37 GiB 966.37 GiB 0.000%

Copilot AI review requested due to automatic review settings March 2, 2026 19:17
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.061275 s 0.06135 s 0.122%
backtest mainnet-368528500-perf snapshot load 3.184 s 2.417 s -24.089%
backtest mainnet-368528500-perf total elapsed 61.275218 s 61.350115 s 0.122%
firedancer mem usage with mainnet.toml 966.37 GiB 966.37 GiB 0.000%

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 3, 2026 00:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Copilot AI review requested due to automatic review settings March 3, 2026 01:33
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.071447 s 0.071991 s 0.761%
backtest mainnet-368528500-perf snapshot load 3.204 s 2.703 s -15.637%
backtest mainnet-368528500-perf total elapsed 71.446924 s 71.990984 s 0.761%
firedancer mem usage with mainnet.toml 964.37 GiB 964.37 GiB 0.000%

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.061414 s 0.061011 s -0.656%
backtest mainnet-368528500-perf snapshot load 3.114 s 2.384 s -23.443%
backtest mainnet-368528500-perf total elapsed 61.414418 s 61.010505 s -0.658%
firedancer mem usage with mainnet.toml 964.37 GiB 964.37 GiB 0.000%

Copilot AI review requested due to automatic review settings March 3, 2026 02:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

fd_compact_tower_sync_serde_t compact_tower_sync_serde;
uchar vote_txn[FD_TPU_PARSED_MTU];
ulong notar_reindex[ FD_VOTER_MAX ];
fd_hash_t notar_removed[ FD_VOTER_MAX ];
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notar_removed/removed is declared as fd_hash_t but is used to store voter addresses (fd_pubkey_t). This is a type mismatch (won't compile) and also makes the subsequent fd_notar_vtr_query( ..., removed[i], ... ) call incorrect. Change the buffer to fd_pubkey_t (or adjust the code to store hashes intentionally and query with the correct key type).

Suggested change
fd_hash_t notar_removed[ FD_VOTER_MAX ];
fd_pubkey_t notar_removed[ FD_VOTER_MAX ];

Copilot uses AI. Check for mistakes.
Comment on lines +870 to +874
FD_TEST( oldr_tower_blk->epoch==newr_tower_blk->epoch || oldr_tower_blk->epoch+1==newr_tower_blk->epoch ); /* root can only move forward one epoch */
if( FD_UNLIKELY( slot_completed->slot==ctx->init_slot || oldr_tower_blk->epoch+1==newr_tower_blk->epoch ) ) {
FD_TEST( newr_tower_blk->epoch==slot_completed->epoch ); /* if the new root's epoch has advanced, it must be in the same epoch as current slot_completed (old root, new root, slot_completed cannot span >2 epochs) */
reindex_notar( ctx, out.root_slot );
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epoch reindex condition is currently slot_completed->slot==ctx->init_slot && oldr_tower_blk->epoch+1==newr_tower_blk->epoch, but the surrounding comment says to reindex when entering a new epoch (and typically also on initialization). With &&, reindexing will be skipped in the common cases where either (a) we're at init_slot but not crossing an epoch, or (b) we cross an epoch later. Update the condition so it triggers whenever reindexing is required (e.g., init OR epoch advanced).

Copilot uses AI. Check for mistakes.
Comment on lines 74 to +77
fd_wksp_t * wksp = fd_wksp_new_anonymous( fd_cstr_to_shmem_page_sz( _page_sz ), page_cnt, fd_shmem_cpu_idx( numa_idx ), "wksp", 0UL );
FD_TEST( wksp );

test_update_voters( wksp );
// test_update_voters( wksp );
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_update_voters (and its invocation) are commented out, leaving this unit test effectively doing nothing. Update/replace this test to exercise the new notar epoch/voter reindex behavior introduced by this PR so regressions are still caught in CI.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.061587 s 0.061411 s -0.286%
backtest mainnet-368528500-perf snapshot load 3.189 s 2.366 s -25.807%
backtest mainnet-368528500-perf total elapsed 61.587062 s 61.410514 s -0.287%
firedancer mem usage with mainnet.toml 964.37 GiB 964.37 GiB 0.000%

@lidatong lidatong merged commit 2636d95 into main Mar 3, 2026
17 checks passed
@lidatong lidatong deleted the chali/fix/7833 branch March 3, 2026 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tower: notar epoch handling

3 participants