fix: failover reconnect + configurable max attempts by gabelul · Pull Request #48 · m3ue/m3u-proxy

gabelul · 2026-03-26T18:28:48Z

Summary

Two fixes for failover reliability:

Client disconnects during transcode input error failover — missing is_failover = True flag meant the HTTP response closed instead of seamlessly switching to the backup URL
Hardcoded 3-attempt failover limit — streams with many failover channels (e.g. 11 across 2 providers) would exhaust all attempts on one dead provider without ever reaching the healthy one

What changed

Fix 1: Keep client connected during failover (commit 1)

The input_failed detection path during active streaming breaks out of the inner loop without setting is_failover = True. The outer loop sees it's False, exits the generator, and kills the HTTP response. The failover_event path had the flag set correctly — this just matches that behavior.

Fix 2: Configurable max failover attempts (commit 2)

New MAX_FAILOVER_ATTEMPTS setting in config (env var):

0 (default): try all available failover URLs before giving up
- Static failover list: uses the actual list length as the limit
- Resolver-based: effectively unlimited, lets the resolver decide when to stop (returns null)
Any positive number: cap at that many attempts (old behavior was hardcoded to 3)

Applied to both direct streaming and transcoded streaming paths.

Tested in production

Deployed to a live setup with 2 IPTV providers (Trex + Strong), 3 concurrent users, channels with up to 11 failover URLs across both providers.

Fix 1 — before vs after:

BEFORE:
18:15:37 - Failover triggered for stream ae85426a...
18:15:37 - Last client disconnected  ← connection killed

AFTER:
18:30:28 - Failover triggered for stream ae85426a...
18:30:28 - Starting failover attempt 1/3 for client...  ← stays alive
18:30:51 - Starting failover attempt 2/3  ← still connected
18:31:53 - Starting failover attempt 3/3  ← still connected

Fix 2 — provider outage scenario:
With hardcoded max of 3, when Strong went down the proxy burned all 3 attempts on dead Strong URLs and never reached the working Trex ones. With the new default (try all), it cycles through every failover until it finds a live stream.

Test plan

Transcoded stream: kill primary source → verify client stays connected through failover
Static failover URLs: verify all URLs are tried before giving up
Resolver-based failover: verify proxy keeps trying until resolver returns null
MAX_FAILOVER_ATTEMPTS=5: verify it stops at 5
No failovers configured: verify default behavior unchanged (max 3)

The input_failed detection path during active streaming (line ~3114) breaks out of the inner while loop without setting is_failover = True. The outer loop then sees is_failover is False and breaks entirely, closing the HTTP response and disconnecting the client. The failover_event path (line ~3152) correctly sets is_failover = True before breaking, allowing the outer loop to continue and reconnect the client to the failover URL seamlessly. Without this fix, every transcode_runtime_input_error failover kills the client connection even though the proxy successfully resolves a failover URL — the client never receives data from the backup stream.

gabelul · 2026-03-26T18:33:29Z

Tested in production — this one's a real fix ✓

Deployed the patched proxy to my live setup (Hetzner dedicated, 2 IPTV providers, 3 concurrent users) and the difference is night and day.

The problem was brutal. Every time a transcoded stream hit an input error, the proxy would correctly resolve the failover URL, log FAILOVER_TRIGGERED, and then... drop the client anyway. The TV would freeze, the user had to manually switch channels and come back. Completely defeated the purpose of having failovers configured.

Root cause: The input_failed detection path during active streaming was missing is_failover = True before the break. The outer loop saw is_failover was False, hit the else: break, and the generator returned — killing the HTTP response. Meanwhile, the failover_event path (triggered by the API) had the flag set correctly and worked fine. Classic one-liner that's invisible until you trace the exact code path.

Before the fix (from my actual logs):

18:15:36 - Transcoding process encountered input error, triggering failover
18:15:37 - Failover resolver returned URL: http://smarter8k.ru/...
18:15:37 - Failover triggered for stream ae85426a...
18:15:37 - Last client disconnected from stream ae85426a  ← game over
18:15:37 - Cleaned up client: client_d85dc337...

After the fix:

18:30:27 - Transcoding process encountered input error, triggering failover
18:30:28 - Failover resolver returned URL: http://smarter8k.ru/...
18:30:28 - Failover triggered for stream ae85426a...
18:30:28 - Starting failover attempt 1/3 for client client_d85dc337...  ← stays alive!
18:30:51 - input error again → failover attempt 2/3 → client still connected
18:31:53 - input error again → failover attempt 3/3 → client still connected

The stream bounced between both providers three times in under two minutes and the TV never dropped. User saw a brief quality hiccup during switches but the stream kept playing. That's exactly how failover should work.

Tested with both the advanced failover resolver (calling back to m3u-editor for capacity checks) and the providers flipping between Trex and Strong sources. Solid. I've actually discovered this, been working on it, watching live a football match, and been having this issue. Seen and this seems to have sorted it, so nice.

Hardcoded limit of 3 failover attempts meant streams with many failover channels (e.g. 11 across 2 providers) would exhaust attempts on one dead provider without ever reaching the healthy one. New behavior: - MAX_FAILOVER_ATTEMPTS=0 (default): try all available failover URLs - Static failover list: uses len(failover_urls) as the limit - Resolver-based: effectively unlimited, lets the resolver decide - MAX_FAILOVER_ATTEMPTS=N: cap at N attempts (old behavior with N=3) Applied to both direct streaming and transcoded streaming paths.

gabelul · 2026-03-26T21:05:34Z

I can break them into commits if you want to, but this is another problem that I have encountered today. A friend of mine was watching a channel that had 11 failover channels, and it was only trying the first three. The first 3 were from the same provider, so it never got to try the rest of them, which were working. I think this is a good addition. Thank you.

sparkison · 2026-03-28T21:31:27Z

Makes sense, this was an arbitrary limit placed a while back. As a heads up, if you use the smart failover resolver, the limit is ignored (Settings > Proxy > Enable advanced failover logic).

gabelul changed the title ~~fix: keep client connected during transcode input error failover~~ fix: failover reconnect + configurable max attempts Mar 26, 2026

sparkison changed the base branch from master to dev March 28, 2026 21:29

sparkison merged commit 5ed084d into m3ue:dev Mar 28, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: failover reconnect + configurable max attempts#48

fix: failover reconnect + configurable max attempts#48
sparkison merged 2 commits intom3ue:devfrom
gabelul:fix/failover-input-error-reconnect

gabelul commented Mar 26, 2026 •

edited

Loading

Uh oh!

gabelul commented Mar 26, 2026 •

edited

Loading

Uh oh!

gabelul commented Mar 26, 2026 •

edited

Loading

Uh oh!

sparkison commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gabelul commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Fix 1: Keep client connected during failover (commit 1)

Fix 2: Configurable max failover attempts (commit 2)

Tested in production

Test plan

Uh oh!

gabelul commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tested in production — this one's a real fix ✓

Uh oh!

gabelul commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sparkison commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gabelul commented Mar 26, 2026 •

edited

Loading

gabelul commented Mar 26, 2026 •

edited

Loading

gabelul commented Mar 26, 2026 •

edited

Loading