fix: failover reconnect + configurable max attempts#48
Conversation
The input_failed detection path during active streaming (line ~3114) breaks out of the inner while loop without setting is_failover = True. The outer loop then sees is_failover is False and breaks entirely, closing the HTTP response and disconnecting the client. The failover_event path (line ~3152) correctly sets is_failover = True before breaking, allowing the outer loop to continue and reconnect the client to the failover URL seamlessly. Without this fix, every transcode_runtime_input_error failover kills the client connection even though the proxy successfully resolves a failover URL — the client never receives data from the backup stream.
Tested in production — this one's a real fix ✓Deployed the patched proxy to my live setup (Hetzner dedicated, 2 IPTV providers, 3 concurrent users) and the difference is night and day. The problem was brutal. Every time a transcoded stream hit an input error, the proxy would correctly resolve the failover URL, log Root cause: The Before the fix (from my actual logs): After the fix: The stream bounced between both providers three times in under two minutes and the TV never dropped. User saw a brief quality hiccup during switches but the stream kept playing. That's exactly how failover should work. Tested with both the advanced failover resolver (calling back to m3u-editor for capacity checks) and the providers flipping between Trex and Strong sources. Solid. I've actually discovered this, been working on it, watching live a football match, and been having this issue. Seen and this seems to have sorted it, so nice. |
Hardcoded limit of 3 failover attempts meant streams with many failover channels (e.g. 11 across 2 providers) would exhaust attempts on one dead provider without ever reaching the healthy one. New behavior: - MAX_FAILOVER_ATTEMPTS=0 (default): try all available failover URLs - Static failover list: uses len(failover_urls) as the limit - Resolver-based: effectively unlimited, lets the resolver decide - MAX_FAILOVER_ATTEMPTS=N: cap at N attempts (old behavior with N=3) Applied to both direct streaming and transcoded streaming paths.
|
I can break them into commits if you want to, but this is another problem that I have encountered today. A friend of mine was watching a channel that had 11 failover channels, and it was only trying the first three. The first 3 were from the same provider, so it never got to try the rest of them, which were working. I think this is a good addition. Thank you. |
|
Makes sense, this was an arbitrary limit placed a while back. As a heads up, if you use the smart failover resolver, the limit is ignored (Settings > Proxy > Enable advanced failover logic). |
Summary
Two fixes for failover reliability:
is_failover = Trueflag meant the HTTP response closed instead of seamlessly switching to the backup URLWhat changed
Fix 1: Keep client connected during failover (commit 1)
The
input_faileddetection path during active streaming breaks out of the inner loop without settingis_failover = True. The outer loop sees it'sFalse, exits the generator, and kills the HTTP response. Thefailover_eventpath had the flag set correctly — this just matches that behavior.Fix 2: Configurable max failover attempts (commit 2)
New
MAX_FAILOVER_ATTEMPTSsetting in config (env var):0(default): try all available failover URLs before giving upApplied to both direct streaming and transcoded streaming paths.
Tested in production
Deployed to a live setup with 2 IPTV providers (Trex + Strong), 3 concurrent users, channels with up to 11 failover URLs across both providers.
Fix 1 — before vs after:
Fix 2 — provider outage scenario:
With hardcoded max of 3, when Strong went down the proxy burned all 3 attempts on dead Strong URLs and never reached the working Trex ones. With the new default (try all), it cycles through every failover until it finds a live stream.
Test plan
MAX_FAILOVER_ATTEMPTS=5: verify it stops at 5