Skip to content

txsend: light belt sanding#8471

Merged
mmcgee-jump merged 1 commit intomainfrom
mmcgee/txsem2
Mar 3, 2026
Merged

txsend: light belt sanding#8471
mmcgee-jump merged 1 commit intomainfrom
mmcgee/txsem2

Conversation

@mmcgee-jump
Copy link
Contributor

No description provided.

@mmcgee-jump mmcgee-jump added this to the v1.0 milestone Feb 25, 2026
Copilot AI review requested due to automatic review settings February 25, 2026 23:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the txsend tile to simplify connection management logic and remove many txsend-specific metrics. The PR title "light belt sanding" accurately describes this as a cleanup/simplification effort.

Changes:

  • Simplified QUIC connection tracking by removing per-port metrics and complex connection state management
  • Removed clock-based timekeeping in favor of direct wallclock calls
  • Consolidated connection tracking into two parallel arrays (peers and conns)
  • Removed numerous txsend-specific metrics related to contact info handling, connection state, and send results

Reviewed changes

Copilot reviewed 7 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/discof/txsend/fd_txsend_tile.h Restructured tile context, replaced old connection map with simpler peer map and connection arrays
src/discof/txsend/fd_txsend_tile.c Major refactoring of connection management logic, removed metrics tracking, simplified QUIC servicing
src/disco/metrics/metrics.xml Removed 21 txsend-specific metrics for contact info and connection management
src/disco/metrics/generated/* Regenerated metric definitions reflecting removed metrics
src/disco/pack/fd_pack_tile.c Removed trailing blank line
src/app/shared_dev/commands/quic_trace/* Updated type name from fd_txsend_tile_ctx_t to fd_txsend_tile_t
book/api/metrics-generated.md Updated documentation for removed metrics

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.052415 s 0.052659 s 0.466%
backtest mainnet-368528500-perf snapshot load 2.434 s 1.824 s -25.062%
backtest mainnet-368528500-perf total elapsed 52.41544 s 52.658938 s 0.465%
firedancer mem usage with mainnet.toml 972.26 GiB 971.27 GiB -0.102%

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 5 comments.

Copilot AI review requested due to automatic review settings March 2, 2026 21:45
@mmcgee-jump mmcgee-jump force-pushed the mmcgee/txsem2 branch 2 times, most recently from 5c6e867 to 9acedf9 Compare March 2, 2026 21:45
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.075174 s 0.075944 s 1.024%
backtest mainnet-368528500-perf snapshot load 4.29 s 3.188 s -25.688%
backtest mainnet-368528500-perf total elapsed 75.173522 s 75.944313 s 1.025%
firedancer mem usage with mainnet.toml 964.37 GiB 965.38 GiB 0.105%

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 5 comments.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.071304 s 0.071834 s 0.743%
backtest mainnet-368528500-perf snapshot load 3.559 s 2.959 s -16.859%
backtest mainnet-368528500-perf total elapsed 71.304069 s 71.833722 s 0.743%
firedancer mem usage with mainnet.toml 964.37 GiB 963.38 GiB -0.103%

Copilot AI review requested due to automatic review settings March 3, 2026 03:09
@mmcgee-jump mmcgee-jump enabled auto-merge (rebase) March 3, 2026 03:10
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.074732 s 0.075145 s 0.553%
backtest mainnet-368528500-perf snapshot load 4.213 s 3.157 s -25.065%
backtest mainnet-368528500-perf total elapsed 74.732328 s 75.144809 s 0.552%
firedancer mem usage with mainnet.toml 964.37 GiB 963.38 GiB -0.103%

@mmcgee-jump mmcgee-jump merged commit 1c0d8ed into main Mar 3, 2026
22 checks passed
@mmcgee-jump mmcgee-jump deleted the mmcgee/txsem2 branch March 3, 2026 03:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 5 comments.


/* Initialize input links */
for( ulong i=0; i<tile->in_cnt; i++ ) {
FD_TEST( tile->in_cnt<sizeof(ctx->in_kind)/sizeof(ctx->in_kind[ 0 ]) );
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FD_TEST( tile->in_cnt<sizeof(ctx->in_kind)/sizeof(ctx->in_kind[0]) ) rejects the case where in_cnt equals the array length (32), even though indexing [0,31] would be valid. Please change this to <= (or adjust the array sizing) to avoid an unnecessary abort at the maximum supported input-link count.

Suggested change
FD_TEST( tile->in_cnt<sizeof(ctx->in_kind)/sizeof(ctx->in_kind[ 0 ]) );
FD_TEST( tile->in_cnt<=sizeof(ctx->in_kind)/sizeof(ctx->in_kind[ 0 ]) );

Copilot uses AI. Check for mistakes.
for( ulong i=0UL; i<7UL; i++ ) {
ulong target_slot = ctx->voted_slot+1UL + i*FD_EPOCH_SLOTS_PER_ROTATION;
leaders[ i ] = fd_multi_epoch_leaders_get_leader_for_slot( ctx->mleaders, target_slot );
FD_TEST( leaders[ i ] );
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In after_credit(), asserting that fd_multi_epoch_leaders_get_leader_for_slot() returns non-NULL can crash the tile when leader schedules are incomplete or the requested slot is outside the initialized epochs. Please handle a NULL leader gracefully here (e.g., skip/disconnect logic for that slot until schedules are available) instead of FD_TEST().

Suggested change
FD_TEST( leaders[ i ] );
if( FD_UNLIKELY( !leaders[ i ] ) ) {
/* Leader schedules are incomplete or target_slot is outside the
initialized epochs. Skip disconnect logic until schedules are
available. */
return;
}

Copilot uses AI. Check for mistakes.
ctx->src_ip_addr,
ctx->src_port,
now );
FD_TEST( conn ); /* never out of connection objects, per above check */
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fd_quic_connect() can legitimately return NULL (e.g., handshake pool exhaustion / eviction too young / TLS handshake allocation failure). FD_TEST(conn) will therefore crash txsend under load. Please replace the assertion with handling for connection creation failure (and consider updating quic_last_connected so the cooldown still applies).

Suggested change
FD_TEST( conn ); /* never out of connection objects, per above check */
if( FD_UNLIKELY( !conn ) ) {
/* Apply cooldown even on connection failure to avoid tight retry
loops that might exceed upstream connection rate limits. */
peer->quic_last_connected[ j ] = now;
continue;
}

Copilot uses AI. Check for mistakes.
entry->quic_ip_addrs[ i ] = 0U;
entry->quic_ports [ i ] = 0U;
entry->udp_ip_addrs [ i ] = 0U;
entry->udp_ports [ i ] = 0U;
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When removing a tombstoned entry from peer_map, the code clears IP/port fields but leaves entry->quic_conns[] intact. Because quic_conn_final() only clears quic_conns for peers still present in peer_map, this can leave stale connection pointers attached to a reused peers[] slot, causing votes to be sent over the wrong connection. Please close/NULL out entry->quic_conns[] (and consider removing any matching entries from ctx->conns[]) during tombstone cleanup.

Suggested change
entry->udp_ports [ i ] = 0U;
entry->udp_ports [ i ] = 0U;
entry->quic_conns [ i ] = NULL;

Copilot uses AI. Check for mistakes.
Comment on lines +376 to +385
peer_map_ele_remove( ctx->peer_map, &stale->pubkey, NULL, ctx->peers );
entry->quic_last_connected[ 0 ] = 0L;
entry->quic_last_connected[ 1 ] = 0L;
for( ulong i=0UL; i<2UL; i++ ) {
entry->quic_ip_addrs[ i ] = 0U;
entry->quic_ports [ i ] = 0U;
entry->udp_ip_addrs [ i ] = 0U;
entry->udp_ports [ i ] = 0U;
}
stale->tombstoned = 0;
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale-entry eviction logic appears inconsistent: it removes stale from peer_map, but then resets entry's sockets/timestamps (not stale's) and sets stale->tombstoned = 0, leaving the evicted object in a non-tombstoned state with potentially live quic_conns/sockets. Please either (a) treat stale==entry as a no-op reinsert/update, or (b) if stale!=entry, properly tombstone/clear (and close conns for) the stale object, and avoid clearing the new entry beyond what’s intended.

Suggested change
peer_map_ele_remove( ctx->peer_map, &stale->pubkey, NULL, ctx->peers );
entry->quic_last_connected[ 0 ] = 0L;
entry->quic_last_connected[ 1 ] = 0L;
for( ulong i=0UL; i<2UL; i++ ) {
entry->quic_ip_addrs[ i ] = 0U;
entry->quic_ports [ i ] = 0U;
entry->udp_ip_addrs [ i ] = 0U;
entry->udp_ports [ i ] = 0U;
}
stale->tombstoned = 0;
/* If stale==entry, we are just updating/reinserting the same
entry and there is nothing to evict. */
if( FD_LIKELY( stale!=entry ) ) {
peer_map_ele_remove( ctx->peer_map, &stale->pubkey, NULL, ctx->peers );
stale->quic_last_connected[ 0 ] = 0L;
stale->quic_last_connected[ 1 ] = 0L;
for( ulong i=0UL; i<2UL; i++ ) {
stale->quic_ip_addrs[ i ] = 0U;
stale->quic_ports [ i ] = 0U;
stale->udp_ip_addrs [ i ] = 0U;
stale->udp_ports [ i ] = 0U;
}
stale->tombstoned = 1;
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants