feat: Add WiFi resilience settings#188
feat: Add WiFi resilience settings#188craigmillard86 wants to merge 2 commits intoCarlosDerSeher:developfrom
Conversation
|
Why do you think we need those things? Could you elaborate a bit? Did you encounter issues related with those settings? |
I've been experiencing regular audio dropouts on a congested home network. My ping times typically hover around 40ms but spike up to 1400ms during congestion peaks. When these spikes occur, the player triggers a hard resync and takes several seconds to stabilize, causing noticeable audio interruption. The Problem The current hardcoded values assume a stable, low-latency network:
On congested WiFi, a 1400ms spike would cause the queue to empty, triggering an aggressive resync that often overshoots, leading to repeated corrections before stabilizing. How These Settings Help
Results With these settings tuned for my network (threshold=5, fast_sync=75000us, headroom=100%), the player now rides through the 1400ms spikes without triggering hard resync, maintaining stable playback where it previously would stutter and resync repeatedly. The defaults remain conservative for good networks, but users with challenging WiFi conditions can now tune for their environment via the Advanced Settings UI. Any thoughts much appreciated on this as want to find a solution for all. |
I can see how these could make sense
What's this exactly?
If you have psram shouldn't you just set a higher buffer on the server if you have a bad network? |
Fast Sync Tolerance I believe this controls how much timing drift the player tolerates before triggering a resync. When a chunk arrives, the player compares actual vs expected playback time. If the difference exceeds this threshold, it triggers corrective action. With the default 50ms tolerance, a network spike that delays packets by 60ms would trigger a resync. Increasing to 100ms lets the player absorb that spike and naturally catch up as the network recovers, rather than forcing an abrupt correction. It's essentially "how late can a packet be before we panic" - on a jittery network, being more forgiving prevents constant resync thrashing. Buffer Headroom vs Server Buffer They serve different purposes: Server Buffer (latency) Sets baseline delay before playback starts. Affects ALL audio - adds fixed latency to every packet regardless of network conditions. Increasing server buffer from 1000ms to 2000ms adds 1 second of latency to all devices no matter the connection (5g, ehternet 2.4g). Buffer headroom lets the client queue hold extra packets during a burst catchup, then drain back to normal as the network recovers. The baseline latency stays the same, but you have capacity to absorb spikes. |
|
Where do you use this Essentially you assume the router/switch accumulates packets because of congestion and those are measures to address this without increasing latency. How about raspberry snapclient? I guess those devices won't have a problem because of much more ram and network stack is buffering packets anyway. But couldn't we just increase max alllowed dynamic rx buffer through menuconfig too to tackle this. One more thing, there is another update in the Pipe, so have a look at sync rework branch. You should base those changes on that branch. I just didn't find the ti.e to merge with dev yet |
|
Did not look into details but if I understand correctly, you want to keep the player playing even if it is out of sync? I think so far the approach was to better hard sync when the sync is off by more than a few ms. I guess it depends on the use case, if you have one player per room you could tolerate a higher difference, but if you have a stereo pair you need to increase the server buffer. Regarding the implementation: I would suggest to keep the lightsnapcast component more as a library with a defined interface and don't add dependencies and calls to the settingsmanager. And it seems you added blocking calls to the player task, that should be avoided. It is not really needed to change these settings on the fly, or is it? I would just apply them in init_player or start_player. @CarlosDerSeher sync rework is merged into develop already. #180 |
Thanks for the reminder. Seems I lost track since there is going on a lot currently :) |
Ah hadnt considered the RX buffers, will have a play with them instead, looks like i was also on an older develop with out the rework. I am reworking this now based on @luar123 comments and only implementing: Queue Empty Threshold Requires 3+ consecutive empty reads before resync, filtering out brief spikes |
Add configurable parameters to improve audio streaming stability on congested WiFi networks: - TCP_NODELAY: Disable Nagle's algorithm for lower latency - Queue empty hysteresis: Require consecutive empty reads before hard resync - Queue insert timeout: Configurable wait time for queue space - Fast sync tolerance: Latency buffer for sync operations - Reconnect delays: Exponential backoff with jitter for reconnection - Buffer headroom: Extra capacity for WiFi jitter tolerance All parameters are configurable via: - menuconfig (compile-time defaults) - New "Advanced Settings" Web UI tab (runtime with NVS persistence)
Remove runtime UI and NVS settings for WiFi resilience in favor of compile-time Kconfig options. This keeps lightsnapcast as a clean library without settings_manager dependency. Changes: - Remove advanced-settings.html and UI handlers - Remove WiFi resilience functions from settings_manager - Use CONFIG_PLAYER_QUEUE_* directly in player.c - Use CONFIG_WIFI_* directly in main.c - Remove unused fast_sync_latency option Remaining Kconfig options: - WIFI_TCP_NODELAY, WIFI_RECONNECT_MIN/MAX_DELAY_MS (main) - PLAYER_QUEUE_EMPTY_THRESHOLD, PLAYER_QUEUE_INSERT_TIMEOUT_MS (lightsnapcast)
068c75a to
03d967b
Compare
|
So i have reworked the code to minimise what is going on here. Although while doing this i have discovered and fixed (I believe) the root of my WiFi issues here: #191. Now unsure if these are worth adding but do provide some configuration options to improve network ressilience. |
|
Hey! I bought two Hifi-ESP32-S3 and equipped them with the Snapclient firmware. I'm also experiencing crashes with the second one using the ESP Audio Dock firmware. |
|
Not sure if ping is a good measure. Snapclient uses a low level implementation that blocks for up to 1s if no packages are arriving. |
At least it's an indication that something isn't working properly here. I also noticed that, depending on its mood, one ESP32 takes between 30 seconds and 2 minutes after being switched on before the sound is reproduced cleanly. |
|
#191 suggests to disable power saving |
| int msgWaiting = uxQueueMessagesWaiting(pcmChkQHdl); | ||
|
|
||
| // Track consecutive empty queue reads for hysteresis | ||
| static int consecutive_empty_count = 0; |
There was a problem hiding this comment.
is this really necessary? I have a feeling if you run out of samples there will be an audible offset between clients if you tolerate this.
There was a problem hiding this comment.
You raise a fair point but this was to help on bad networks -
How it works: When the queue empties, the I2S DMA clock keeps running independently (looping its buffer). With
threshold=3, that's ~78ms where no fresh samples are written. When new data arrives, the player is behind by the
duration of the gap, and soft-sync (APLL/sample insertion) gradually corrects it, but during that window, clients
could be slightly out of sync.
Why it exists: On congested WiFi (my network regularly sees ping spikes to 1400ms), a single empty queue read with threshold=1 triggers a full hard resync, mute, stop I2S, reset initialSync, re-establish sync from scratch. its a
multi-second audible disruption for what might be a 26ms network hiccup that resolves on its own. This hysteresis
avoids that.
Threshold=1 gives tightest multi-client sync but is fragile on imperfect networks. Threshold=3 tolerates brief WiFi hiccups but risks ~78ms transient offset that soft-sync corrects over a few seconds. Both have audible impact, it's a question of which is less disruptive for the user's environment.
That's why it's a Kconfig setting (PLAYER_QUEUE_EMPTY_THRESHOLD, range 1-10, default 3) rather than a hardcoded change, users on clean networks can set it to 1 for tight sync, while those on congested WiFi can increase it to avoid constant resyncs.
Happy to adjust the default if you think 1 is more appropriate for the typical use case, what do you think?
There was a problem hiding this comment.
multi-second audible disruption
I am wondering, the hard resync never takes that long, normally it's just a short "click" in the speakers and if you don't listen carefully and this just happens once it's almost not noticeable.
Is this a board/DAC specific problem maybe?
Adds compile-time configuration options for WiFi stability via idf.py menuconfig.
Lightsnapcast Player Settings
WiFi jitter where packets arrive in bursts rather than evenly spaced.
WiFi Resilience Settings