Conversation
Performance Measurements ⏳
|
There was a problem hiding this comment.
Pull request overview
This PR updates the snapshot pinger used by snapct to use TCP connection attempts (instead of ICMP) to estimate peer reachability/latency, and adjusts the snapct tile’s sandboxing configuration to accommodate the new ping mechanism.
Changes:
- Replace ICMP echo handling in
fd_sspingwith a pool of nonblocking TCP sockets andpoll()-based completion tracking. - Update
snapctsandbox allowlists/seccomp policy to account for many ping fds (min/max fd range) and allowpoll. - Adjust
snapctfd budgeting and allowed-fd population to include all ping sockets.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/discof/restore/utils/fd_ssping.h | Adds FD_SSPING_FD_CNT and replaces single-socket API with multi-fd enumeration API. |
| src/discof/restore/utils/fd_ssping.c | Implements TCP-based “ping” via nonblocking connect() and poll(), plus fd pool bookkeeping. |
| src/discof/restore/generated/fd_snapct_tile_seccomp.h | Regenerates seccomp filter to allow poll and accept a ping-fd range (min/max). |
| src/discof/restore/fd_snapct_tile.seccomppolicy | Updates policy inputs and rules to account for multiple ping fds and poll. |
| src/discof/restore/fd_snapct_tile.c | Expands fd budgeting and allowlist population to include all ping fds; passes ping fd range into seccomp generator. |
| /* Abort the connection attempt or close the connection by connecting | ||
| to AF_UNSPEC. */ | ||
| struct sockaddr_in addr[1] = {{ | ||
| .sin_family = AF_UNSPEC, | ||
| .sin_addr = { .s_addr = 0U }, | ||
| .sin_port = 0 | ||
| }}; | ||
| if( FD_UNLIKELY( connect( fdesc, addr, sizeof(addr) ) ) ) FD_LOG_ERR(( "connect(AF_UNSPEC) failed (%d-%s)", errno, fd_io_strerror( errno ) )); | ||
|
|
There was a problem hiding this comment.
remove_fdesc_idx() attempts to "abort/close" a TCP connection by calling connect() with sa_family=AF_UNSPEC. That disconnect trick is only defined for datagram sockets; for TCP it typically fails (e.g., EAFNOSUPPORT/EINVAL). Because this failure is handled with FD_LOG_ERR, the tile can reliably crash as soon as a ping fd is reclaimed. Consider instead explicitly shutting down and closing the socket and then re-creating a fresh nonblocking TCP socket (and adjust the snapct seccomp policy accordingly to permit close/socket/setsockopt for these ping fds), or redesign the ping fd handling so stream sockets are not reused via AF_UNSPEC.
| /* Abort the connection attempt or close the connection by connecting | |
| to AF_UNSPEC. */ | |
| struct sockaddr_in addr[1] = {{ | |
| .sin_family = AF_UNSPEC, | |
| .sin_addr = { .s_addr = 0U }, | |
| .sin_port = 0 | |
| }}; | |
| if( FD_UNLIKELY( connect( fdesc, addr, sizeof(addr) ) ) ) FD_LOG_ERR(( "connect(AF_UNSPEC) failed (%d-%s)", errno, fd_io_strerror( errno ) )); | |
| /* Properly abort any in-progress or established TCP connection by | |
| shutting down and closing the socket, then create a fresh | |
| nonblocking TCP socket to return to the idle pool. */ | |
| if( FD_UNLIKELY( shutdown( fdesc, SHUT_RDWR ) ) ) { | |
| /* ENOTCONN / EINVAL are benign (socket already closed or not | |
| connected). Other errors indicate an unexpected failure. */ | |
| if( FD_UNLIKELY( (errno!=ENOTCONN) & (errno!=EINVAL) ) ) | |
| FD_LOG_ERR(( "shutdown() failed (%d-%s)", errno, fd_io_strerror( errno ) )); | |
| } | |
| if( FD_UNLIKELY( close( fdesc ) ) ) | |
| FD_LOG_ERR(( "close() failed (%d-%s)", errno, fd_io_strerror( errno ) )); | |
| fdesc = socket( AF_INET, SOCK_STREAM | SOCK_NONBLOCK, 0 ); | |
| if( FD_UNLIKELY( fdesc==-1 ) ) | |
| FD_LOG_ERR(( "socket() failed (%d-%s)", errno, fd_io_strerror( errno ) )); |
| deadline_list_ele_remove( peer->state==PEER_STATE_PINGED ? ssping->pinged : ssping->refreshing, peer, ssping->pool ); | ||
| int is_err = ssping->used_fds[ i ].revents & (POLLRDHUP|POLLERR|POLLHUP); | ||
| if( FD_LIKELY( !is_err ) ) { | ||
| peer->latency_nanos = (ulong)fd_long_max( now - (peer->deadline_nanos - PEER_DEADLINE_NANOS_PING), 1L ); | ||
| peer->state = PEER_STATE_VALID; | ||
| peer->deadline_nanos = now + PEER_DEADLINE_NANOS_VALID; | ||
| deadline_list_ele_push_tail( ssping->valid, peer, ssping->pool ); | ||
|
|
||
| FD_LOG_INFO(( "pinged " FD_IP4_ADDR_FMT ":%hu in %lu nanos", | ||
| FD_IP4_ADDR_FMT_ARGS( peer->addr.addr ), fd_ushort_bswap( peer->addr.port ), peer->latency_nanos )); | ||
| ssping->on_ping_cb( ssping->cb_arg, peer->addr, peer->latency_nanos ); | ||
| } else { |
There was a problem hiding this comment.
recv_pings() treats a socket as successfully "pinged" when poll revents does not include POLLERR/POLLHUP/POLLRDHUP, but for nonblocking connect() the reliable way to determine success/failure is to query SO_ERROR via getsockopt(SOL_SOCKET, SO_ERROR). Some connect failures can present as POLLOUT without POLLERR, which would incorrectly mark an unreachable peer as VALID and skew peer selection. Suggest: on POLLOUT, call getsockopt(SO_ERROR) and treat any nonzero error as invalid (and only compute latency / call on_ping_cb when SO_ERROR==0).
| switch( peer->state ) { | ||
| case PEER_STATE_UNPINGED: | ||
| deadline_list_ele_remove( ssping->unpinged, peer, ssping->pool ); | ||
| break; | ||
| case PEER_STATE_PINGED: | ||
| deadline_list_ele_remove( ssping->pinged, peer, ssping->pool ); | ||
| break; | ||
| case PEER_STATE_VALID: | ||
| deadline_list_ele_remove( ssping->valid, peer, ssping->pool ); | ||
| break; | ||
| case PEER_STATE_REFRESHING: | ||
| deadline_list_ele_remove( ssping->refreshing, peer, ssping->pool ); | ||
| break; | ||
| case PEER_STATE_INVALID: | ||
| return; | ||
| } | ||
| peer->state = PEER_STATE_INVALID; | ||
| peer->deadline_nanos = now + PEER_DEADLINE_NANOS_INVALID; | ||
| deadline_list_ele_push_tail( ssping->invalid, peer, ssping->pool ); |
There was a problem hiding this comment.
fd_ssping_invalidate() can transition a peer from PINGED/REFRESHING to INVALID without releasing its associated ping fd / used_fd_idx. With the new invariant that PINGED/REFRESHING peers are exactly those owning a ping fd, this leaves a stale entry in used_fds/ping_to_pool pointing at a peer that is no longer in those lists (and may later be reused by the pool), which can lead to incorrect behavior or use-after-free. When invalidating a peer that currently has a ping fd, reclaim the fd (e.g., via remove_fdesc_idx(peer->used_fd_idx)) before moving it to INVALID.
| 1UL + /* logfile */ | ||
| 1UL; |
There was a problem hiding this comment.
rlimit_file_cnt() now adds an extra bare + 1UL with no comment or corresponding fd usage. This looks like an accidental overcount and makes it harder to audit the fd budget (especially now that FD_SSPING_FD_CNT is significant). Either remove this term or document exactly which file descriptor it is intended to account for.
| 1UL + /* logfile */ | |
| 1UL; | |
| 1UL; /* logfile */ |
| ulong out_fds_cnt, | ||
| int * out_fds ) { | ||
| if( FD_UNLIKELY( out_fds_cnt<6UL ) ) FD_LOG_ERR(( "out_fds_cnt %lu", out_fds_cnt )); | ||
| if( FD_UNLIKELY( out_fds_cnt<255UL ) ) FD_LOG_ERR(( "out_fds_cnt %lu", out_fds_cnt )); |
There was a problem hiding this comment.
populate_allowed_fds() hard-codes a minimum capacity check of 255. This is tightly coupled to the current FD_SSPING_FD_CNT (249) and the current list of always/optionally included fds, and it will silently become wrong if any of those change. Consider expressing this in terms of FD_SSPING_FD_CNT plus the maximum number of non-ssping fds that can be appended in this function (and accounting for the optional logfile fd) so the bound stays correct as the code evolves.
Closes #8255