Skip to content

Comments

feat: Add WiFi resilience settings#188

Open
craigmillard86 wants to merge 2 commits intoCarlosDerSeher:developfrom
anabolyc:feature/wifi-resilience
Open

feat: Add WiFi resilience settings#188
craigmillard86 wants to merge 2 commits intoCarlosDerSeher:developfrom
anabolyc:feature/wifi-resilience

Conversation

@craigmillard86
Copy link

@craigmillard86 craigmillard86 commented Jan 17, 2026

Adds compile-time configuration options for WiFi stability via idf.py menuconfig.

Lightsnapcast Player Settings

  • PLAYER_QUEUE_EMPTY_THRESHOLD (default 3) - Number of consecutive empty queue reads before triggering a hard resync. Provides tolerance for brief WiFi dropouts (~78ms at default). During empty reads, audio continues playing from DMA buffers while the sync algorithm maintains timing. If the threshold is exceeded, a clean hard resync is triggered to prevent audio glitches.
  • PLAYER_QUEUE_INSERT_TIMEOUT_MS (default 50) - Maximum time to wait when inserting audio chunks into the playback queue. Allows the queue to drain during burst packet arrivals rather than immediately dropping chunks. Helps handle
    WiFi jitter where packets arrive in bursts rather than evenly spaced.

WiFi Resilience Settings

  • WIFI_TCP_NODELAY (default y) - Disables Nagle's algorithm on the snapserver TCP connection. Sends packets immediately rather than buffering small writes, reducing latency for time-sensitive audio data.
  • WIFI_RECONNECT_MIN_DELAY_MS (default 1000) - Initial delay before attempting to reconnect after connection loss. Starting point for exponential backoff.
  • WIFI_RECONNECT_MAX_DELAY_MS (default 30000) - Maximum reconnection delay with exponential backoff. Prevents hammering the server during extended outages. Includes random jitter to avoid multiple clients reconnecting simultaneously.

@CarlosDerSeher
Copy link
Owner

Why do you think we need those things? Could you elaborate a bit? Did you encounter issues related with those settings?

@craigmillard86
Copy link
Author

Why do you think we need those things? Could you elaborate a bit? Did you encounter issues related with those settings?

I've been experiencing regular audio dropouts on a congested home network. My ping times typically hover around 40ms but spike up to 1400ms during congestion peaks. When these spikes occur, the player triggers a hard resync and takes several seconds to stabilize, causing noticeable audio interruption.

The Problem

The current hardcoded values assume a stable, low-latency network:

  • A single empty queue read triggers immediate hard resync
  • Fixed sync tolerance doesn't account for network jitter
  • No configurable buffer headroom for absorbing latency spikes

On congested WiFi, a 1400ms spike would cause the queue to empty, triggering an aggressive resync that often overshoots, leading to repeated corrections before stabilizing.

How These Settings Help

  • Queue Empty Threshold Requires 3+ consecutive empty reads before resync, filtering out brief spikes
  • Fast Sync Tolerance Larger latency buffer (50ms default, adjustable to 100ms) absorbs jitter without resyncing
  • Buffer Headroom Extra queue capacity provides cushion for burst delays using spare psram
  • Queue Insert Timeout Prevents premature packet drops during brief congestion
  • Reconnect Delays Exponential backoff prevents hammering the server during network issues
  • TCP No Delay Optional tuning for latency vs throughput tradeoff

Results

With these settings tuned for my network (threshold=5, fast_sync=75000us, headroom=100%), the player now rides through the 1400ms spikes without triggering hard resync, maintaining stable playback where it previously would stutter and resync repeatedly.

The defaults remain conservative for good networks, but users with challenging WiFi conditions can now tune for their environment via the Advanced Settings UI.

Any thoughts much appreciated on this as want to find a solution for all.

@CarlosDerSeher
Copy link
Owner

CarlosDerSeher commented Jan 18, 2026

Queue Empty Threshold Requires 3+ consecutive empty reads before resync, filtering out brief spikes

Queue Insert Timeout Prevents premature packet drops during brief congestion

Reconnect Delays Exponential backoff prevents hammering the server during network issues

TCP No Delay Optional tuning for latency vs throughput tradeoff

I can see how these could make sense

Fast Sync Tolerance Larger latency buffer (50ms default, adjustable to 100ms) absorbs jitter without resyncing

What's this exactly?

Buffer Headroom Extra queue capacity provides cushion for burst delays using spare psram

If you have psram shouldn't you just set a higher buffer on the server if you have a bad network?

@craigmillard86
Copy link
Author

Queue Empty Threshold Requires 3+ consecutive empty reads before resync, filtering out brief spikes

Queue Insert Timeout Prevents premature packet drops during brief congestion

Reconnect Delays Exponential backoff prevents hammering the server during network issues

TCP No Delay Optional tuning for latency vs throughput tradeoff

I can see how these could make sense

Fast Sync Tolerance Larger latency buffer (50ms default, adjustable to 100ms) absorbs jitter without resyncing

What's this exactly?

Buffer Headroom Extra queue capacity provides cushion for burst delays using spare psram

If you have psram shouldn't you just set a higher buffer on the server if you have a bad network?

Fast Sync Tolerance

I believe this controls how much timing drift the player tolerates before triggering a resync. When a chunk arrives, the player compares actual vs expected playback time. If the difference exceeds this threshold, it triggers corrective action.

With the default 50ms tolerance, a network spike that delays packets by 60ms would trigger a resync. Increasing to 100ms lets the player absorb that spike and naturally catch up as the network recovers, rather than forcing an abrupt correction.

It's essentially "how late can a packet be before we panic" - on a jittery network, being more forgiving prevents constant resync thrashing.

Buffer Headroom vs Server Buffer

They serve different purposes:

Server Buffer (latency) Sets baseline delay before playback starts. Affects ALL audio - adds fixed latency to every packet regardless of network conditions.
Buffer Headroom (client) Extra queue capacity to absorb temporary bursts without adding baseline latency. Only used when network spikes occur, otherwise sits empty.

Increasing server buffer from 1000ms to 2000ms adds 1 second of latency to all devices no matter the connection (5g, ehternet 2.4g).

Buffer headroom lets the client queue hold extra packets during a burst catchup, then drain back to normal as the network recovers. The baseline latency stays the same, but you have capacity to absorb spikes.

@CarlosDerSeher
Copy link
Owner

Where do you use this Fast Sync Tolerance exactly?

Essentially you assume the router/switch accumulates packets because of congestion and those are measures to address this without increasing latency. How about raspberry snapclient? I guess those devices won't have a problem because of much more ram and network stack is buffering packets anyway. But couldn't we just increase max alllowed dynamic rx buffer through menuconfig too to tackle this.

One more thing, there is another update in the Pipe, so have a look at sync rework branch. You should base those changes on that branch. I just didn't find the ti.e to merge with dev yet

@luar123
Copy link
Contributor

luar123 commented Jan 18, 2026

Did not look into details but if I understand correctly, you want to keep the player playing even if it is out of sync? I think so far the approach was to better hard sync when the sync is off by more than a few ms. I guess it depends on the use case, if you have one player per room you could tolerate a higher difference, but if you have a stereo pair you need to increase the server buffer.

Regarding the implementation: I would suggest to keep the lightsnapcast component more as a library with a defined interface and don't add dependencies and calls to the settingsmanager. And it seems you added blocking calls to the player task, that should be avoided. It is not really needed to change these settings on the fly, or is it? I would just apply them in init_player or start_player.

@CarlosDerSeher sync rework is merged into develop already. #180

@CarlosDerSeher
Copy link
Owner

CarlosDerSeher commented Jan 18, 2026

sync rework is merged into develop already. #180

Thanks for the reminder. Seems I lost track since there is going on a lot currently :)

@craigmillard86
Copy link
Author

Where do you use this Fast Sync Tolerance exactly?

Essentially you assume the router/switch accumulates packets because of congestion and those are measures to address this without increasing latency. How about raspberry snapclient? I guess those devices won't have a problem because of much more ram and network stack is buffering packets anyway. But couldn't we just increase max alllowed dynamic rx buffer through menuconfig too to tackle this.

One more thing, there is another update in the Pipe, so have a look at sync rework branch. You should base those changes on that branch. I just didn't find the ti.e to merge with dev yet

Ah hadnt considered the RX buffers, will have a play with them instead, looks like i was also on an older develop with out the rework. I am reworking this now based on @luar123 comments and only implementing:

Queue Empty Threshold Requires 3+ consecutive empty reads before resync, filtering out brief spikes
Queue Insert Timeout Prevents premature packet drops during brief congestion
Reconnect Delays Exponential backoff prevents hammering the server during network issues
TCP No Delay Optional tuning for latency vs throughput tradeoff

Add configurable parameters to improve audio streaming stability on
congested WiFi networks:

- TCP_NODELAY: Disable Nagle's algorithm for lower latency
- Queue empty hysteresis: Require consecutive empty reads before hard resync
- Queue insert timeout: Configurable wait time for queue space
- Fast sync tolerance: Latency buffer for sync operations
- Reconnect delays: Exponential backoff with jitter for reconnection
- Buffer headroom: Extra capacity for WiFi jitter tolerance

All parameters are configurable via:
- menuconfig (compile-time defaults)
- New "Advanced Settings" Web UI tab (runtime with NVS persistence)
Remove runtime UI and NVS settings for WiFi resilience in favor of
compile-time Kconfig options. This keeps lightsnapcast as a clean
library without settings_manager dependency.

Changes:
- Remove advanced-settings.html and UI handlers
- Remove WiFi resilience functions from settings_manager
- Use CONFIG_PLAYER_QUEUE_* directly in player.c
- Use CONFIG_WIFI_* directly in main.c
- Remove unused fast_sync_latency option

Remaining Kconfig options:
- WIFI_TCP_NODELAY, WIFI_RECONNECT_MIN/MAX_DELAY_MS (main)
- PLAYER_QUEUE_EMPTY_THRESHOLD, PLAYER_QUEUE_INSERT_TIMEOUT_MS (lightsnapcast)
@craigmillard86 craigmillard86 force-pushed the feature/wifi-resilience branch from 068c75a to 03d967b Compare January 18, 2026 18:18
@craigmillard86 craigmillard86 changed the title feat: Add WiFi resilience settings with Advanced Settings UI feat: Add WiFi resilience settings Jan 18, 2026
@craigmillard86
Copy link
Author

So i have reworked the code to minimise what is going on here.

Although while doing this i have discovered and fixed (I believe) the root of my WiFi issues here: #191. Now unsure if these are worth adding but do provide some configuration options to improve network ressilience.

@Hoerli1337
Copy link

Hey!
Can the fix also solve this problem?
Is the problem identical?
sonocotta/esp32-audio-dock#78 (comment)

I bought two Hifi-ESP32-S3 and equipped them with the Snapclient firmware.
One plays the music after about 10-20 attempts and the latency normalizes to 1-5ms during operation - when muted, it is 50-500ms.
The second one doesn't work at all and keeps restarting due to corrupt values.
The latency is always between 60-5000 ms.
Distance to the AP ~4 meters.
RSSI -71

I'm also experiencing crashes with the second one using the ESP Audio Dock firmware.
Is there an error in the code, or have I unfortunately received a partially defective device?

@luar123
Copy link
Contributor

luar123 commented Jan 24, 2026

Not sure if ping is a good measure. Snapclient uses a low level implementation that blocks for up to 1s if no packages are arriving.

craigmillard86 added a commit to anabolyc/esp32-snapclient that referenced this pull request Jan 25, 2026
@Hoerli1337
Copy link

Not sure if ping is a good measure. Snapclient uses a low level implementation that blocks for up to 1s if no packages are arriving.

At least it's an indication that something isn't working properly here.
If the latency rises above 20-30ms for 2-3 pings, there are dropouts in the sound.

I also noticed that, depending on its mood, one ESP32 takes between 30 seconds and 2 minutes after being switched on before the sound is reproduced cleanly.

@CarlosDerSeher
Copy link
Owner

#191 suggests to disable power saving

int msgWaiting = uxQueueMessagesWaiting(pcmChkQHdl);

// Track consecutive empty queue reads for hysteresis
static int consecutive_empty_count = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really necessary? I have a feeling if you run out of samples there will be an audible offset between clients if you tolerate this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You raise a fair point but this was to help on bad networks -

How it works: When the queue empties, the I2S DMA clock keeps running independently (looping its buffer). With
threshold=3, that's ~78ms where no fresh samples are written. When new data arrives, the player is behind by the
duration of the gap, and soft-sync (APLL/sample insertion) gradually corrects it, but during that window, clients
could be slightly out of sync.

Why it exists: On congested WiFi (my network regularly sees ping spikes to 1400ms), a single empty queue read with threshold=1 triggers a full hard resync, mute, stop I2S, reset initialSync, re-establish sync from scratch. its a
multi-second audible disruption for what might be a 26ms network hiccup that resolves on its own. This hysteresis
avoids that.

Threshold=1 gives tightest multi-client sync but is fragile on imperfect networks. Threshold=3 tolerates brief WiFi hiccups but risks ~78ms transient offset that soft-sync corrects over a few seconds. Both have audible impact, it's a question of which is less disruptive for the user's environment.

That's why it's a Kconfig setting (PLAYER_QUEUE_EMPTY_THRESHOLD, range 1-10, default 3) rather than a hardcoded change, users on clean networks can set it to 1 for tight sync, while those on congested WiFi can increase it to avoid constant resyncs.

Happy to adjust the default if you think 1 is more appropriate for the typical use case, what do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-second audible disruption

I am wondering, the hard resync never takes that long, normally it's just a short "click" in the speakers and if you don't listen carefully and this just happens once it's almost not noticeable.

Is this a board/DAC specific problem maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants