Conversation
There was a problem hiding this comment.
Pull request overview
This pull request refactors the txsend tile to simplify connection management logic and remove many txsend-specific metrics. The PR title "light belt sanding" accurately describes this as a cleanup/simplification effort.
Changes:
- Simplified QUIC connection tracking by removing per-port metrics and complex connection state management
- Removed clock-based timekeeping in favor of direct wallclock calls
- Consolidated connection tracking into two parallel arrays (peers and conns)
- Removed numerous txsend-specific metrics related to contact info handling, connection state, and send results
Reviewed changes
Copilot reviewed 7 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/discof/txsend/fd_txsend_tile.h | Restructured tile context, replaced old connection map with simpler peer map and connection arrays |
| src/discof/txsend/fd_txsend_tile.c | Major refactoring of connection management logic, removed metrics tracking, simplified QUIC servicing |
| src/disco/metrics/metrics.xml | Removed 21 txsend-specific metrics for contact info and connection management |
| src/disco/metrics/generated/* | Regenerated metric definitions reflecting removed metrics |
| src/disco/pack/fd_pack_tile.c | Removed trailing blank line |
| src/app/shared_dev/commands/quic_trace/* | Updated type name from fd_txsend_tile_ctx_t to fd_txsend_tile_t |
| book/api/metrics-generated.md | Updated documentation for removed metrics |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Performance Measurements ⏳
|
17778ef to
77bf2d0
Compare
5c6e867 to
9acedf9
Compare
Performance Measurements ⏳
|
9acedf9 to
61bb5d6
Compare
Performance Measurements ⏳
|
61bb5d6 to
6aeb55b
Compare
Performance Measurements ⏳
|
|
|
||
| /* Initialize input links */ | ||
| for( ulong i=0; i<tile->in_cnt; i++ ) { | ||
| FD_TEST( tile->in_cnt<sizeof(ctx->in_kind)/sizeof(ctx->in_kind[ 0 ]) ); |
There was a problem hiding this comment.
FD_TEST( tile->in_cnt<sizeof(ctx->in_kind)/sizeof(ctx->in_kind[0]) ) rejects the case where in_cnt equals the array length (32), even though indexing [0,31] would be valid. Please change this to <= (or adjust the array sizing) to avoid an unnecessary abort at the maximum supported input-link count.
| FD_TEST( tile->in_cnt<sizeof(ctx->in_kind)/sizeof(ctx->in_kind[ 0 ]) ); | |
| FD_TEST( tile->in_cnt<=sizeof(ctx->in_kind)/sizeof(ctx->in_kind[ 0 ]) ); |
| for( ulong i=0UL; i<7UL; i++ ) { | ||
| ulong target_slot = ctx->voted_slot+1UL + i*FD_EPOCH_SLOTS_PER_ROTATION; | ||
| leaders[ i ] = fd_multi_epoch_leaders_get_leader_for_slot( ctx->mleaders, target_slot ); | ||
| FD_TEST( leaders[ i ] ); |
There was a problem hiding this comment.
In after_credit(), asserting that fd_multi_epoch_leaders_get_leader_for_slot() returns non-NULL can crash the tile when leader schedules are incomplete or the requested slot is outside the initialized epochs. Please handle a NULL leader gracefully here (e.g., skip/disconnect logic for that slot until schedules are available) instead of FD_TEST().
| FD_TEST( leaders[ i ] ); | |
| if( FD_UNLIKELY( !leaders[ i ] ) ) { | |
| /* Leader schedules are incomplete or target_slot is outside the | |
| initialized epochs. Skip disconnect logic until schedules are | |
| available. */ | |
| return; | |
| } |
| ctx->src_ip_addr, | ||
| ctx->src_port, | ||
| now ); | ||
| FD_TEST( conn ); /* never out of connection objects, per above check */ |
There was a problem hiding this comment.
fd_quic_connect() can legitimately return NULL (e.g., handshake pool exhaustion / eviction too young / TLS handshake allocation failure). FD_TEST(conn) will therefore crash txsend under load. Please replace the assertion with handling for connection creation failure (and consider updating quic_last_connected so the cooldown still applies).
| FD_TEST( conn ); /* never out of connection objects, per above check */ | |
| if( FD_UNLIKELY( !conn ) ) { | |
| /* Apply cooldown even on connection failure to avoid tight retry | |
| loops that might exceed upstream connection rate limits. */ | |
| peer->quic_last_connected[ j ] = now; | |
| continue; | |
| } |
| entry->quic_ip_addrs[ i ] = 0U; | ||
| entry->quic_ports [ i ] = 0U; | ||
| entry->udp_ip_addrs [ i ] = 0U; | ||
| entry->udp_ports [ i ] = 0U; |
There was a problem hiding this comment.
When removing a tombstoned entry from peer_map, the code clears IP/port fields but leaves entry->quic_conns[] intact. Because quic_conn_final() only clears quic_conns for peers still present in peer_map, this can leave stale connection pointers attached to a reused peers[] slot, causing votes to be sent over the wrong connection. Please close/NULL out entry->quic_conns[] (and consider removing any matching entries from ctx->conns[]) during tombstone cleanup.
| entry->udp_ports [ i ] = 0U; | |
| entry->udp_ports [ i ] = 0U; | |
| entry->quic_conns [ i ] = NULL; |
| peer_map_ele_remove( ctx->peer_map, &stale->pubkey, NULL, ctx->peers ); | ||
| entry->quic_last_connected[ 0 ] = 0L; | ||
| entry->quic_last_connected[ 1 ] = 0L; | ||
| for( ulong i=0UL; i<2UL; i++ ) { | ||
| entry->quic_ip_addrs[ i ] = 0U; | ||
| entry->quic_ports [ i ] = 0U; | ||
| entry->udp_ip_addrs [ i ] = 0U; | ||
| entry->udp_ports [ i ] = 0U; | ||
| } | ||
| stale->tombstoned = 0; |
There was a problem hiding this comment.
Stale-entry eviction logic appears inconsistent: it removes stale from peer_map, but then resets entry's sockets/timestamps (not stale's) and sets stale->tombstoned = 0, leaving the evicted object in a non-tombstoned state with potentially live quic_conns/sockets. Please either (a) treat stale==entry as a no-op reinsert/update, or (b) if stale!=entry, properly tombstone/clear (and close conns for) the stale object, and avoid clearing the new entry beyond what’s intended.
| peer_map_ele_remove( ctx->peer_map, &stale->pubkey, NULL, ctx->peers ); | |
| entry->quic_last_connected[ 0 ] = 0L; | |
| entry->quic_last_connected[ 1 ] = 0L; | |
| for( ulong i=0UL; i<2UL; i++ ) { | |
| entry->quic_ip_addrs[ i ] = 0U; | |
| entry->quic_ports [ i ] = 0U; | |
| entry->udp_ip_addrs [ i ] = 0U; | |
| entry->udp_ports [ i ] = 0U; | |
| } | |
| stale->tombstoned = 0; | |
| /* If stale==entry, we are just updating/reinserting the same | |
| entry and there is nothing to evict. */ | |
| if( FD_LIKELY( stale!=entry ) ) { | |
| peer_map_ele_remove( ctx->peer_map, &stale->pubkey, NULL, ctx->peers ); | |
| stale->quic_last_connected[ 0 ] = 0L; | |
| stale->quic_last_connected[ 1 ] = 0L; | |
| for( ulong i=0UL; i<2UL; i++ ) { | |
| stale->quic_ip_addrs[ i ] = 0U; | |
| stale->quic_ports [ i ] = 0U; | |
| stale->udp_ip_addrs [ i ] = 0U; | |
| stale->udp_ports [ i ] = 0U; | |
| } | |
| stale->tombstoned = 1; | |
| } |
No description provided.