Skip to content

gui, topo: allow_crash tile flag#8449

Open
jherrera-jump wants to merge 1 commit intofiredancer-io:mainfrom
jherrera-jump:jherrera/topo-add-expendable-tiles
Open

gui, topo: allow_crash tile flag#8449
jherrera-jump wants to merge 1 commit intofiredancer-io:mainfrom
jherrera-jump:jherrera/topo-add-expendable-tiles

Conversation

@jherrera-jump
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings February 25, 2026 00:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an allow_crash flag for tiles in the Firedancer topology system, enabling certain tiles to exit without tearing down the entire application. This complements the existing allow_shutdown flag by providing a mechanism for handling tiles that may exit unexpectedly (with exit code 0) while ensuring the system remains stable through proper flow control management.

Changes:

  • Added allow_crash field to the fd_topo_tile structure
  • Implemented validation to prevent reliable consumption from tiles with allow_crash=1
  • Added runtime handling to provision infinite credits to producers when an allow_crash tile exits

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/disco/topo/fd_topo.h Added allow_crash field to tile structure with documentation
src/disco/topo/fd_topob.c Initialized allow_crash field and added validation logic during topology construction and finalization
src/app/shared/commands/run/run.c Implemented runtime handling for tiles with allow_crash=1 that exit cleanly, including fseq credit provisioning
Comments suppressed due to low confidence (1)

src/disco/topo/fd_topob.c:214

  • Spelling error: "reliablaly" should be "reliably".
    if( FD_UNLIKELY( topo->tiles[ producer_tile_id ].allow_crash && reliable ) ) FD_LOG_ERR(( "cannot reliablaly consume from a tile with allow_crash=1: %s:%lu ", tile_name, tile_kind_id ) );

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from d70a817 to b53c98c Compare February 25, 2026 16:37
Copilot AI review requested due to automatic review settings February 25, 2026 19:23
@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from b53c98c to fb5e211 Compare February 25, 2026 19:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (3)

src/disco/topo/fd_topob.c:214

  • There is a spelling error in the error message. "reliablaly" should be "reliably".
    if( FD_UNLIKELY( topo->tiles[ producer_tile_id ].allow_crash && reliable ) ) FD_LOG_ERR(( "cannot reliablaly consume from a tile with allow_crash=1: %s:%lu ", tile_name, tile_kind_id ) );

src/discof/replay/fd_replay_tile.c:769

  • The log message on line 769 still mentions "gui" but the GUI tile no longer increments the bank refcnt based on the changes in this PR. The comment should be updated to only mention "tower, rpc" to reflect the actual consumers that increment the refcnt.
  FD_LOG_DEBUG(( "bank (idx=%lu, slot=%lu) refcnt incremented to %lu for tower, rpc, gui", bank->data->idx, slot, bank->data->refcnt ));

src/disco/gui/fd_gui_peers.c:849

  • The change from stable sort to inplace sort on line 849 could introduce subtle behavioral changes. Previously, when deduplicating node accounts by keeping the vote account with the largest stake, the stable sort (line 849 in old code) would preserve the relative order of elements with equal node_account values after the stake sort. With an inplace sort, if multiple vote accounts have the same node_account and stake, the order becomes undefined. While this is likely rare in practice, it could lead to non-deterministic behavior when selecting which vote account to keep for duplicates with identical stakes. Consider whether this non-determinism is acceptable or if stable sort should be retained.
  fd_gui_peers_votes_pkey_sort_inplace( votes_sorted, voter_cnt );

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from fb5e211 to 287f7bf Compare February 25, 2026 21:20
Copilot AI review requested due to automatic review settings February 25, 2026 22:58
@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from 287f7bf to 31da7ad Compare February 25, 2026 22:58
@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from 31da7ad to d82fd79 Compare February 25, 2026 22:59
@jherrera-jump jherrera-jump changed the title topo: allow_crash tile flag gui, topo: allow_crash tile flag Feb 25, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

src/discof/replay/fd_replay_tile.c:791

  • The log message states "refcnt incremented to %lu for tower, rpc, gui" but the GUI tile no longer holds a bank reference based on the changes in this PR (the gui_replay link was removed and the bank refcnt is no longer incremented for GUI on line 789). The log message should be updated to remove the reference to "gui" to accurately reflect which tiles hold the bank reference.
  FD_LOG_DEBUG(( "bank (idx=%lu, slot=%lu) refcnt incremented to %lu for tower, rpc, gui", bank->data->idx, slot, bank->data->refcnt ));

src/disco/topo/fd_topob.c:214

  • Spelling error: "reliablaly" should be "reliably"
    if( FD_UNLIKELY( topo->tiles[ producer_tile_id ].allow_crash && reliable ) ) FD_LOG_ERR(( "cannot reliablaly consume from a tile with allow_crash=1: %s:%lu ", tile_name, tile_kind_id ) );

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from d82fd79 to a93c92a Compare February 25, 2026 23:11
Copilot AI review requested due to automatic review settings February 25, 2026 23:13
@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from a93c92a to 16520ea Compare February 25, 2026 23:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/disco/gui/fd_gui_peers.c:941

  • Memory leak in voters_map: The voters_map is populated in fd_gui_peers_stage_vote_update but never cleared after fd_gui_peers_commit_vote_updates processes the updates. This will cause the map to continuously grow with stale vote accounts that are no longer active, potentially leading to memory exhaustion over time. Consider clearing the voters_map and voters_pool after processing vote updates in fd_gui_peers_commit_vote_updates, or implementing a mechanism to remove stale entries.
fd_gui_peers_commit_vote_updates( fd_gui_peers_ctx_t *  peers,
                                  long                  now,
                                  fd_pubkey_t *         identity ) {
  (void)now;
  fd_gui_peers_voter_t * votes_sorted = peers->voters_scratch.voters1;

  ulong voter_cnt = 0UL;
  for( fd_gui_peers_voters_map_iter_t iter = fd_gui_peers_voters_map_iter_init( peers->voters_map, peers->voters_pool );
        !fd_gui_peers_voters_map_iter_done( iter, peers->voters_map, peers->voters_pool );
        iter = fd_gui_peers_voters_map_iter_next( iter, peers->voters_map, peers->voters_pool ) ) {
    FD_TEST( voter_cnt<FD_RUNTIME_MAX_VOTE_ACCOUNTS );
    fd_gui_peers_voter_t * ele = fd_gui_peers_voters_map_iter_ele( iter, peers->voters_map, peers->voters_pool );
    votes_sorted[ voter_cnt++ ] = *ele;
  }

  /* deduplicate node accounts, keeping the vote accounts with largest stake */
  votes_sorted = fd_gui_peers_votes_stake_sort_inplace( votes_sorted, voter_cnt );
  votes_sorted = fd_gui_peers_votes_pkey_sort_stable_fast( votes_sorted, voter_cnt, peers->voters_scratch.voters2 );

  ulong total_stake = 0UL;
  fd_pubkey_t prev_peer = { 0 };
  for( ulong i=0UL; i<voter_cnt; i++ ) {
    if( FD_UNLIKELY( !memcmp( prev_peer.uc, votes_sorted[ i ].node_account.uc, sizeof(fd_pubkey_t) ) ) ) {
      votes_sorted[ i ].stake = ULONG_MAX; /* flag as duplicate */
    } else {
      total_stake += votes_sorted[ i ].stake;
    }
    prev_peer = votes_sorted[ i ].node_account;
  }

  /* get stake-weighted 67th percentile last_vote_slot */
  fd_gui_peers_votes_slot_sort_inplace( votes_sorted, voter_cnt );

  ulong cumulative_stake = 0UL;
  ulong last_vote_slot_p67 = ULONG_MAX;
  for( ulong i=0UL; i<voter_cnt; i++ ) {
    if( FD_UNLIKELY( votes_sorted[ i ].stake==ULONG_MAX ) ) continue;
    cumulative_stake += votes_sorted[ i ].stake;
    if( FD_LIKELY( 3*cumulative_stake>2*total_stake ) ) {
      last_vote_slot_p67 = votes_sorted[ i ].last_vote_slot;
    }
  }

  int * actions = peers->voters_scratch.actions;
  ulong * idxs  = peers->voters_scratch.idxs;
  ulong count = 0UL;
  for( ulong i=0UL; i<voter_cnt; i++ ) {
    if( FD_UNLIKELY( votes_sorted[ i ].stake==ULONG_MAX ) ) continue;

    /* votes_sorted is a copy of the vote_states bank field that has
       been sorted by stake descending and deduplicated.  Deduplicated
       here means if multiple vote accounts point to the same identity
       key, we go with the one with the most stake.  TODO: This logic
       will need to change once SIMD-0180 hits mainnet.

       As long as the vote account exists, it will be in vote_states,
       which get initialized at snapshot load and gets updated by the
       runtime. So, on any given fork, `last_voted_slot` should reflect
       the last landed vote for ALL the vote accounts (including those
       referencing identity->uc) from the perspective of that fork's
       bank, even if that slot didn't have landed votes for some of
       those accounts. */
    if( FD_UNLIKELY( !memcmp( &votes_sorted[ i ].node_account, identity->uc, sizeof(fd_pubkey_t) ) && peers->slot_voted!=votes_sorted[ i ].last_vote_slot ) ) {
      peers->slot_voted = fd_ulong_if( votes_sorted[ i ].last_vote_slot==0UL, ULONG_MAX, votes_sorted[ i ].last_vote_slot );
      fd_gui_peers_printf_vote_slot( peers );
      fd_http_server_ws_broadcast( peers->http );
    }

    ulong peer_idx = fd_gui_peers_node_pubkey_map_idx_query( peers->node_pubkey_map, &votes_sorted[ i ].node_account, ULONG_MAX, peers->contact_info_table );
    if( FD_UNLIKELY( peer_idx==ULONG_MAX ) ) continue; /* peer not on gossip */

    fd_gui_peers_node_t * peer = peers->contact_info_table + peer_idx;

    /* TODO: we only publish updates when stake changes, otherwise we'd
       have to republish for every peer every slot, which ends up being
       too much bandwidth because we republish all the peer info.
       Ideally, we decouple the vote updates from the reset of the peer
       info which would let us make updates quickly. */
    int is_delinquent = ((long)last_vote_slot_p67 - (long)votes_sorted[ i ].last_vote_slot) > 150L;
    int vote_eq = peer->has_vote_info
               && !memcmp( peer->vote_account.uc, votes_sorted[ i ].vote_account.uc, sizeof(fd_pubkey_t) )
               && peer->stake                   ==votes_sorted[ i ].stake
            // && peer->last_vote_slot          ==votes_sorted[ i ].last_vote_slot
            // && peer->last_vote_timestamp     ==votes_sorted[ i ].last_vote_timestamp
               && peer->is_delinquent           ==is_delinquent;

    if( FD_LIKELY( vote_eq ) ) continue; /* nop */

    peer->has_vote_info = 1;
    peer->vote_account        = votes_sorted[ i ].vote_account;
    peer->last_vote_slot      = votes_sorted[ i ].last_vote_slot;
    peer->last_vote_timestamp = votes_sorted[ i ].last_vote_timestamp;
    peer->is_delinquent       = is_delinquent;

    if( FD_UNLIKELY( peer->stake!=votes_sorted[ i ].stake ) ) {
      fd_gui_peers_live_table_idx_remove( peers->live_table, peer_idx, peers->contact_info_table );
      peer->stake = votes_sorted[ i ].stake;
      fd_gui_peers_live_table_idx_insert( peers->live_table, peer_idx, peers->contact_info_table );
    }

    actions[ count ] = FD_GUI_PEERS_NODE_UPDATE;
    idxs   [ count ] = peer_idx;
    count++;
  }

  if( FD_UNLIKELY( count ) ) {
    fd_gui_peers_printf_nodes( peers, actions, idxs, count );
    fd_http_server_ws_broadcast( peers->http );
  }
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from 929fd71 to eb7ea0d Compare February 26, 2026 20:17
Copilot AI review requested due to automatic review settings February 27, 2026 21:43
@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from eb7ea0d to 0e912fb Compare February 27, 2026 21:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 2, 2026 23:09
@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from 6e1681d to 5058014 Compare March 2, 2026 23:09
@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from 5058014 to d8e68e3 Compare March 2, 2026 23:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

src/discof/replay/fd_replay_tile.c:812

  • Bank refcnt accounting looks inconsistent after removing the gui increment. publish_slot_completed still expects "increment by 1 for each consumer that uses bank_idx", but replay later decrements refcnt for IN_KIND_GUI acks unconditionally. With gui enabled (and rpc potentially disabled), this will decrement without a corresponding increment and can underflow/free the bank too early. Add the gui refcnt++ back when ctx->gui_enabled and keep the debug message consistent with the actual consumers.
  /* refcnt should be incremented by 1 for each consumer that uses
     `bank_idx`.  Each consumer should decrement the bank's refcnt once
     they are done using the bank. */
  bank->data->refcnt++; /* tower_tile */
  if( FD_LIKELY( ctx->rpc_enabled ) ) bank->data->refcnt++; /* rpc tile */
  if( FD_LIKELY( ctx->gui_enabled ) ) bank->data->refcnt++; /* gui tile */
  slot_info->bank_idx = bank->data->idx;
  FD_LOG_DEBUG(( "bank (idx=%lu, slot=%lu) refcnt incremented to %lu for tower, rpc, gui", bank->data->idx, slot, bank->data->refcnt ));

src/discof/replay/fd_replay_tile.c:812

  • slot_info->bank_seq is assigned here, but fd_replay_slot_completed (in fd_replay_tile.h) does not currently define a bank_seq field. This will not compile (or will write into the wrong field if a different member name was intended). Add the missing struct member (or remove the assignment) so producer/consumer agree on the message layout.
  if( FD_LIKELY( ctx->gui_enabled ) ) bank->data->refcnt++; /* gui tile */
  slot_info->bank_idx = bank->data->idx;
  FD_LOG_DEBUG(( "bank (idx=%lu, slot=%lu) refcnt incremented to %lu for tower, rpc, gui", bank->data->idx, slot, bank->data->refcnt ));

@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from d8e68e3 to 8bb3961 Compare March 2, 2026 23:17
Copilot AI review requested due to automatic review settings March 2, 2026 23:23
@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from 8bb3961 to a10f255 Compare March 2, 2026 23:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/discof/replay/fd_replay_tile.c:730

  • cost_tracker_snap() now uses ULONG_MAX as the "unknown" sentinel (via memset(..., 0xFF, ...)) when block_cost_limit==0, and the GUI logic checks for ULONG_MAX before updating compute unit fields. However, when the bank has no cost tracker pool element, this branch memset()s slot_info->cost_tracker to 0, which the GUI will treat as a real value and overwrite max/used compute units with 0. Set the no-cost-tracker case to the same ULONG_MAX sentinel (or otherwise keep the sentinel semantics consistent with the GUI).
cost_tracker_snap( fd_bank_t * bank, fd_replay_slot_completed_t * slot_info ) {
  if( bank->data->cost_tracker_pool_idx!=fd_bank_cost_tracker_pool_idx_null( fd_bank_get_cost_tracker_pool( bank->data ) ) ) {
    fd_cost_tracker_t const * cost_tracker = fd_bank_cost_tracker_locking_query( bank );
    if( FD_UNLIKELY( cost_tracker->block_cost_limit==0UL ) ) {
      memset( &slot_info->cost_tracker, (int)UINT_MAX, sizeof(slot_info->cost_tracker) );
    } else {
      slot_info->cost_tracker.block_cost                   = cost_tracker->block_cost;
      slot_info->cost_tracker.vote_cost                    = cost_tracker->vote_cost;
      slot_info->cost_tracker.allocated_accounts_data_size = cost_tracker->allocated_accounts_data_size;
      slot_info->cost_tracker.block_cost_limit             = cost_tracker->block_cost_limit;
      slot_info->cost_tracker.vote_cost_limit              = cost_tracker->vote_cost_limit;
      slot_info->cost_tracker.account_cost_limit           = cost_tracker->account_cost_limit;
    }
    fd_bank_cost_tracker_end_locking_query( bank );
  } else {
    memset( &slot_info->cost_tracker, 0, sizeof(slot_info->cost_tracker) );
  }

@jherrera-jump jherrera-jump marked this pull request as ready for review March 3, 2026 02:01
Copilot AI review requested due to automatic review settings March 3, 2026 02:01
@jherrera-jump jherrera-jump marked this pull request as draft March 3, 2026 02:01
@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from 35821c2 to 4bfdc10 Compare March 3, 2026 02:02
@jherrera-jump jherrera-jump marked this pull request as ready for review March 3, 2026 02:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from 4bfdc10 to 5436d8c Compare March 3, 2026 02:16
@jherrera-jump jherrera-jump force-pushed the jherrera/topo-add-expendable-tiles branch from 5436d8c to b22d12c Compare March 3, 2026 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants