net: prefbusy poll mode#8443
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request implements preferred busy polling mode for the network (XDP) stack in Firedancer. Preferred busy polling moves NAPI management from the kernel to userspace, potentially improving network performance by reducing context switches and giving more control over polling behavior.
Changes:
- Added preferred busy polling mode configuration and implementation for XDP sockets
- Introduced new configuration parameters for poll mode, busy poll timeouts, and GRO flush settings
- Created sysfs-poll configuration stage to set Linux kernel parameters for busy polling
- Modified XDP tile logic to support both "prefbusy" and "softirq" polling modes with automatic fallback
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| src/waltz/xdp/fd_xsk.h | Added poll configuration fields to fd_xsk_params and fd_xsk structs |
| src/waltz/xdp/fd_xsk.c | Implemented fd_xsk_setup_poll function with socket options and NAPI ID tracking |
| src/disco/topo/fd_topo.h | Added poll mode configuration fields to tile structure |
| src/disco/net/xdp/fd_xdp_tile.c | Implemented prefbusy and softirq polling modes in before_credit handlers |
| src/disco/net/fd_net_tile_topo.c | Propagated configuration values and converted timeout units |
| src/app/shared_dev/commands/udpecho/udpecho.c | Integrated sysfs-poll configuration stage |
| src/app/shared_dev/commands/pktgen/pktgen.c | Integrated sysfs-poll configuration stage |
| src/app/shared/fd_config_parse.c | Added parsing for new poll configuration parameters |
| src/app/shared/fd_config.h | Added poll mode configuration fields to config structure |
| src/app/shared/commands/configure/sysfs-poll.c | New file implementing sysfs configuration for busy polling |
| src/app/shared/commands/configure/configure.h | Declared sysfs-poll configuration stage |
| src/app/shared/Local.mk | Added sysfs-poll to build system |
| src/app/firedancer/main.c | Registered sysfs-poll configuration stage |
| src/app/firedancer/config/default.toml | Added default poll mode configuration |
| src/app/fddev/main.h | Registered sysfs-poll configuration stage |
| src/app/fdctl/main.c | Registered sysfs-poll configuration stage |
| src/app/fdctl/config/default.toml | Added default poll mode configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static char const setting_gro_flush_timeout[] = "gro_flush_timeout"; | ||
|
|
||
| static int | ||
| enabled( config_t const * config ) { |
There was a problem hiding this comment.
Potential null pointer dereference: strcmp is called on config->net.xdp.poll_mode without checking if it's NULL first. If poll_mode is not set in the configuration, this could cause a crash. Consider adding a null check before the strcmp call.
| enabled( config_t const * config ) { | |
| enabled( config_t const * config ) { | |
| if( !config->net.xdp.poll_mode ) | |
| return 0; |
|
|
||
| # This is the minimum time between napi polls if in prefbusy | ||
| # mode. This is important for protecting against a livelock | ||
| # scenario inwhich Firedancer is not given enough time in |
There was a problem hiding this comment.
Spelling error: "inwhich" should be "in which" (two words).
| # scenario inwhich Firedancer is not given enough time in | |
| # scenario in which Firedancer is not given enough time in |
| # when to poll as well as the poll budget, into userspace | ||
| # if in "prefbusy" mode. The fallback is "softirq" mode, | ||
| # which relies much more on linux to manage napi, through | ||
| # wakeups, softirqs and under higher network load, a seperate |
There was a problem hiding this comment.
Spelling error: "seperate" should be "separate".
| # wakeups, softirqs and under higher network load, a seperate | |
| # wakeups, softirqs and under higher network load, a separate |
| struct { | ||
| char xdp_mode[ 8 ]; | ||
| int xdp_zero_copy; | ||
|
|
There was a problem hiding this comment.
Trailing whitespace: Line 204 has trailing whitespace at the end. This should be removed to maintain code cleanliness.
| static void | ||
| init( config_t const * config ) { | ||
| sysfs_net_set( config->net.interface, setting_napi_defer_hard_irqs, VERY_HIGH_VAL ); | ||
| sysfs_net_set( config->net.interface, setting_gro_flush_timeout, config->net.xdp.gro_flush_timeout_nanos ); |
There was a problem hiding this comment.
Type mismatch and potential data loss: config->net.xdp.gro_flush_timeout_nanos is of type ulong but is being cast to uint in fd_file_util_write_uint. If the value exceeds UINT_MAX, data will be lost. Consider using a function that writes ulong values, or add validation to ensure the value fits within uint range.
| if( FD_UNLIKELY( fd_tickcount() < ( flusher->last_prefbusy_poll_ticks + flusher->lwr_prefbusy_poll_ticks ) ) ) return 0; | ||
| if( FD_UNLIKELY( fd_tickcount() > ( flusher->last_prefbusy_poll_ticks + flusher->upr_prefbusy_poll_ticks ) ) ) return 1; |
There was a problem hiding this comment.
Performance issue: fd_tickcount() is called multiple times (lines 1156 and 1157) in net_prefbusy_poll_ready. Consider caching the result in a local variable to avoid redundant calls, especially since this function is called in a tight loop.
| if( FD_UNLIKELY( fd_tickcount() < ( flusher->last_prefbusy_poll_ticks + flusher->lwr_prefbusy_poll_ticks ) ) ) return 0; | |
| if( FD_UNLIKELY( fd_tickcount() > ( flusher->last_prefbusy_poll_ticks + flusher->upr_prefbusy_poll_ticks ) ) ) return 1; | |
| long now = fd_tickcount(); | |
| if( FD_UNLIKELY( now < ( flusher->last_prefbusy_poll_ticks + flusher->lwr_prefbusy_poll_ticks ) ) ) return 0; | |
| if( FD_UNLIKELY( now > ( flusher->last_prefbusy_poll_ticks + flusher->upr_prefbusy_poll_ticks ) ) ) return 1; |
| /* Check if the XSK is aware of the driver's NAPI ID for the | ||
| associated RX queue. Without it, preferred busy polling is not | ||
| going to work correctly. Note it's not always associated straight | ||
| away so xsk->napi_id can sometimes be set to 0 when it shouldn't be. |
There was a problem hiding this comment.
Trailing whitespace: Line 368 has trailing whitespace at the end. This should be removed to maintain code cleanliness.
| away so xsk->napi_id can sometimes be set to 0 when it shouldn't be. | |
| away so xsk->napi_id can sometimes be set to 0 when it shouldn't be. |
|
|
||
| # This is the minimum time between napi polls if in prefbusy | ||
| # mode. This is important for protecting against a livelock | ||
| # scenario inwhich Firedancer is not given enough time in |
There was a problem hiding this comment.
Spelling error: "inwhich" should be "in which" (two words).
| # scenario inwhich Firedancer is not given enough time in | |
| # scenario in which Firedancer is not given enough time in |
| @@ -217,6 +217,15 @@ struct fd_xsk_params { | |||
| /* sockaddr_xdp.sxdp_flags additional params, e.g. XDP_ZEROCOPY */ | |||
| uint bind_flags; | |||
|
|
|||
There was a problem hiding this comment.
Missing documentation: The poll_mode field lacks a comment explaining what it is and what values it can take (e.g., "prefbusy" or "softirq"). All other fields in the struct have explanatory comments, so this field should too for consistency.
| /* poll_mode: Selects the NAPI polling strategy. | |
| Typical values: | |
| - "prefbusy": prefer userspace busy-polling, honoring busy_poll_usecs | |
| - "softirq": rely on kernel softirq-based processing without busy polling */ |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| char path[ PATH_MAX ]; | ||
| fd_cstr_printf_check( path, PATH_MAX, NULL, "/sys/class/net/%s/%s", device, setting ); | ||
| FD_LOG_NOTICE(( "RUN: `echo \"%lu\" > %s`", value, path )); | ||
| fd_file_util_write_uint( path, (uint)value ); |
There was a problem hiding this comment.
sysfs_net_set truncates the ulong value to uint and ignores the return code from fd_file_util_write_uint. This can silently write the wrong value (or fail without reporting) if the configured value exceeds UINT_MAX or if the sysfs write fails. Use fd_file_util_write_ulong (and check for -1 to log/abort) to match the ulong API and fail loudly on errors.
| fd_file_util_write_uint( path, (uint)value ); | |
| if( fd_file_util_write_ulong( path, value )==-1L ) { | |
| FD_LOG_ERR(( "Failed to write \"%lu\" to %s", value, path )); | |
| } |
| uint value; | ||
| fd_cstr_printf_check( path, PATH_MAX, NULL, "/sys/class/net/%s/%s", config->net.interface, setting_napi_defer_hard_irqs ); | ||
| if( fd_file_util_read_uint( path, &value ) || value < VERY_HIGH_VAL ) { | ||
| NOT_CONFIGURED("Setting napi_defer_hard_irqs failed."); | ||
| } | ||
|
|
||
| fd_cstr_printf_check( path, PATH_MAX, NULL, "/sys/class/net/%s/%s", config->net.interface, setting_gro_flush_timeout ); | ||
| if( fd_file_util_read_uint( path, &value ) || value != config->net.xdp.gro_flush_timeout_nanos ) { |
There was a problem hiding this comment.
check() reads sysfs values into a uint but compares against config->net.xdp.gro_flush_timeout_nanos (ulong). This comparison will be wrong if the configured value doesn't fit in 32 bits, and it also doesn't distinguish read errors from mismatched values. Prefer reading into a ulong via fd_file_util_read_ulong and comparing like-for-like.
| uint value; | |
| fd_cstr_printf_check( path, PATH_MAX, NULL, "/sys/class/net/%s/%s", config->net.interface, setting_napi_defer_hard_irqs ); | |
| if( fd_file_util_read_uint( path, &value ) || value < VERY_HIGH_VAL ) { | |
| NOT_CONFIGURED("Setting napi_defer_hard_irqs failed."); | |
| } | |
| fd_cstr_printf_check( path, PATH_MAX, NULL, "/sys/class/net/%s/%s", config->net.interface, setting_gro_flush_timeout ); | |
| if( fd_file_util_read_uint( path, &value ) || value != config->net.xdp.gro_flush_timeout_nanos ) { | |
| ulong value; | |
| fd_cstr_printf_check( path, PATH_MAX, NULL, "/sys/class/net/%s/%s", config->net.interface, setting_napi_defer_hard_irqs ); | |
| if( fd_file_util_read_ulong( path, &value ) || value < (ulong)VERY_HIGH_VAL ) { | |
| NOT_CONFIGURED("Setting napi_defer_hard_irqs failed."); | |
| } | |
| fd_cstr_printf_check( path, PATH_MAX, NULL, "/sys/class/net/%s/%s", config->net.interface, setting_gro_flush_timeout ); | |
| if( fd_file_util_read_ulong( path, &value ) || value != config->net.xdp.gro_flush_timeout_nanos ) { |
| CFG_POP ( cstr, net.xdp.poll_mode ); | ||
| CFG_POP ( uint, net.xdp.busy_poll_usecs ); | ||
| CFG_POP ( ulong, net.xdp.gro_flush_timeout_nanos ); | ||
| CFG_POP ( uint, net.xdp.lwr_prefbusy_poll_timeout_micros ); | ||
| CFG_POP ( uint, net.xdp.upr_prefbusy_poll_timeout_micros ); |
There was a problem hiding this comment.
New XDP polling config fields are extracted here, but fd_config_validate() currently doesn't validate net.xdp.poll_mode (enumeration) or the new timeout ranges. This can lead to surprising runtime behavior (e.g. silent fallback) or invalid sysfs/socket settings. Consider adding explicit validation for allowed poll_mode values ("prefbusy"/"softirq") and sane bounds for the new numeric fields.
| /* sockaddr_xdp.sxdp_flags additional params, e.g. XDP_ZEROCOPY */ | ||
| uint bind_flags; | ||
|
|
||
| char * poll_mode; |
There was a problem hiding this comment.
poll_mode is treated as a read-only configuration string throughout the call chain; using char * here allows accidental modification and forces casts when callers have const strings. Prefer char const * poll_mode (or a fixed-size array like other config strings) to express intent and avoid mutation.
| char * poll_mode; | |
| char const * poll_mode; |
src/disco/net/xdp/fd_xdp_tile.c
Outdated
| if( FD_LIKELY( rr_xsk->prefbusy_poll_enabled ) ) { | ||
| before_credit_prefbusy( ctx, charge_busy, rr_idx, rr_xsk ); | ||
| } else { | ||
| net_rx_wakeup( ctx, rr_xsk, charge_busy ); | ||
| ctx->rr_idx++; | ||
| ctx->rr_idx = fd_uint_if( ctx->rr_idx>=ctx->xsk_cnt, 0, ctx->rr_idx ); | ||
| /* Fallback poll mode which relies on linux irqs and wakeups */ | ||
| before_credit_softirq( ctx, charge_busy, rr_idx, rr_xsk ); |
There was a problem hiding this comment.
This change introduces a new before_credit_prefbusy() execution path gated by rr_xsk->prefbusy_poll_enabled, but the existing XDP tile unit tests appear to exercise only the legacy (softirq) behavior. Adding at least one test case that sets prefbusy_poll_enabled=1 and asserts the expected wakeup / RX handling behavior would help prevent regressions in the new polling mode.
src/waltz/xdp/fd_xsk.c
Outdated
| errno, fd_io_strerror( errno ) )); | ||
| return; | ||
| } | ||
| if( FD_UNLIKELY( fcntl( xsk->xsk_fd, F_SETFL, sk_flags | O_NONBLOCK ) ) == -1 ) { |
There was a problem hiding this comment.
The error check for setting O_NONBLOCK is broken due to operator precedence: FD_UNLIKELY( fcntl(...) ) == -1 compares the boolean result of FD_UNLIKELY against -1, so the failure branch will never run. This can leave the socket in blocking mode while still treating prefbusy polling as enabled. Wrap the comparison inside FD_UNLIKELY (or compare the fcntl result to -1 before passing it to FD_UNLIKELY).
| if( FD_UNLIKELY( fcntl( xsk->xsk_fd, F_SETFL, sk_flags | O_NONBLOCK ) ) == -1 ) { | |
| if( FD_UNLIKELY( fcntl( xsk->xsk_fd, F_SETFL, sk_flags | O_NONBLOCK ) == -1 ) ) { |
src/waltz/xdp/fd_xsk.c
Outdated
| fd_xsk_setup_poll( fd_xsk_t * xsk, | ||
| fd_xsk_params_t const * params ) { | ||
| xsk->prefbusy_poll_enabled = 0; | ||
| if( 0!=strcmp( params->poll_mode, "prefbusy" ) ) return; |
There was a problem hiding this comment.
params->poll_mode is dereferenced via strcmp without a NULL check. Since poll_mode is a new field in fd_xsk_params_t, any caller that doesn't explicitly initialize it will segfault. Consider treating NULL/empty as the default (e.g. "softirq") and only enabling prefbusy when params->poll_mode is non-NULL and matches.
| if( 0!=strcmp( params->poll_mode, "prefbusy" ) ) return; | |
| if( !params->poll_mode || !params->poll_mode[0] || | |
| 0!=strcmp( params->poll_mode, "prefbusy" ) ) return; |
src/waltz/xdp/fd_xsk.c
Outdated
| if( FD_UNLIKELY( 0!=setsockopt( xsk->xsk_fd, SOL_SOCKET, SO_BUSY_POLL, ¶ms->busy_poll_usecs, sizeof(uint) ) ) ) { | ||
| FD_LOG_WARNING(( "setsockopt(xsk_fd,SOL_SOCKET,SO_BUSY_POLL,%u) failed (%i-%s)", | ||
| params->busy_poll_usecs, errno, fd_io_strerror( errno ) )); | ||
| return; | ||
| } | ||
|
|
||
| /* The greater busy_poll_budget is, the greater the bias towards max RX pps | ||
| over max TX pps in a max network load scenario. */ | ||
| uint busy_poll_budget = 64U; | ||
| if( FD_UNLIKELY( 0!=setsockopt( xsk->xsk_fd, SOL_SOCKET, SO_BUSY_POLL_BUDGET, &busy_poll_budget, sizeof(uint) ) ) ) { | ||
| FD_LOG_WARNING(( "setsockopt(xsk_fd,SOL_SOCKET,SO_BUSY_POLL_BUDGET,%u) failed (%i-%s)", | ||
| busy_poll_budget, errno, fd_io_strerror( errno ) )); |
There was a problem hiding this comment.
For SO_BUSY_POLL and SO_BUSY_POLL_BUDGET, the kernel expects an int option value/len. Passing a uint (and sizeof(uint)) is non-portable and can cause EINVAL on platforms where the sizes differ. Prefer using an int local for these options and passing sizeof(int) to setsockopt.
…n-carter/firedancer into tristan-carter/prefbusy-poll-mode
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # when to poll as well as the poll budget, into userspace | ||
| # if in "prefbusy" mode. The fallback is "softirq" mode, | ||
| # which relies much more on linux to manage napi, through | ||
| # wakeups, softirqs and under higher network load, a seperate |
There was a problem hiding this comment.
Corrected spelling of 'seperate' to 'separate'.
| # wakeups, softirqs and under higher network load, a seperate | |
| # wakeups, softirqs and under higher network load, a separate |
|
|
||
| # This is the minimum time between napi polls if in prefbusy | ||
| # mode. This is important for protecting against a livelock | ||
| # scenario inwhich Firedancer is not given enough time in |
There was a problem hiding this comment.
Corrected spacing: 'inwhich' should be 'in which'.
| # scenario inwhich Firedancer is not given enough time in | |
| # scenario in which Firedancer is not given enough time in |
Preferred busy polling implementation. Still undergoing testing.